Learning Automata with Hankel Matrices

Size: px
Start display at page:

Download "Learning Automata with Hankel Matrices"

Transcription

1 Learning Automata with Hankel Matrices Borja Balle Amazon Research Cambridge Highlights London, September 2017

2 Weighted Finite Automata (WFA) (over R) Graphical Representation Algebraic Representation 1 a, 1.2 b, 2 q a, 2 b, a, 1 b, 2 a, 3.2 b, 5 q 2 0 α β A xα, β, ta a u apσ y j j A a A b j j Behavioral Representation Each WFA A computes a function A : Σ Ñ R given by Apx 1 x T q α J A x1 A xt β

3 In This Talk... Describe a core algorithm common to many algorithms for learning weighted automata Explain the role this core plays in three learning problems in different setups Survey extensions to more complex models and some applications

4 Outline 1. From Hankel Matrices to Weighted Automata 2. From Data to Hankel Matrices 3. From Theory to Practice

5 Outline 1. From Hankel Matrices to Weighted Automata 2. From Data to Hankel Matrices 3. From Theory to Practice

6 Hankel Matrices and Fliess Theorem Given f : Σ Ñ R define its Hankel matrix H f P R Σ ˆΣ as H f ɛ a b s» ɛ f pɛq f paq f pbq. a f paq f paaq f pabq. b f pbq f pbaq f pbbq.. p f ppsq. fi fl Theorem [Fli74] 1. The rank of H f is finite if and only if f is computed by a WFA 2. The rank rankpf q rankph f q equals the number of states of a minimal WFA computing f

7 The Structure of Hankel Matrices App 1 p T s 1 s T 1q α J A p1 A pt A s1 A st 1 β H s» p f ppsq fi» fl fi» fi fl fl App 1 p T as 1 s T 1q α J A p1 A pt A a A s1 A st 1 β H a s» p f ppasq fi» fl fi» fi» fl fi fl fl Algebraically: Factorizing H lets us solve for A a H P S =ñ H σ P A a S =ñ A a P` H a S`

8 SVD-based Reconstruction [HKZ09; Bal+14] Inputs Desired number of states r Basis B pp, Sq with P, S Ă Σ, ɛ P P X S Finite Hankel blocks indexed by prefixes and suxes in B: H B P R PˆS H B Σ th B a P R PˆS : a P Σu Algorithm: SpectralpH B, H B Σ, rq 1. Compute the rank r SVD of H B «UDV J 2. Let A a D 1 U J H a V 3. Let α V J H B pɛ, q and β D 1 U J H B p, ɛq 4. Return A xα, β, ta a uy Running time: 1. SVD takes Op P S rq 2. Matrix multiplications take Op Σ P S r q

9 Properties of Spectral [HKZ09; Bal13; BM15a] Consistency If P is prefix-closed, S is sux-closed, and r rankph B q rankprh B H B Σ sq P P P Σ, the WFA A SpectralpH B, H B Σ, rq satisfies App sq H B pp, sq and Ãpp a sq H B a pp, sq Recovery If H B and H B Σ are sub-blocks of H f with r rankpf q rankph B q Then the WFA A SpectralpH B, H B Σ, rq satisfies A f Robustness If r rankph B q rankprh B H B Σ sq and }HB Ĥ B } ď ε and }H B a Ĥ B a } ď ε for all a P Σ Then xα, β, ta a uy SpectralpH B, H B Σ, rq and xˆα, ˆβ, tâ a uy SpectralpĤ B, Ĥ B Σ, rq satisfy }α ˆα}, }β ˆβ}, }A a Âa} ď ε

10 Outline 1. From Hankel Matrices to Weighted Automata 2. From Data to Hankel Matrices 3. From Theory to Practice

11 Learning Models 1. Exact query learning: membership + equivalence queries [BV96; BBM06; BM15a] 2. Distributional PAC learning: samples from a stochastic WFA [HKZ09; BDR09; Bal+14] 3. Statistical learning: optimize output predictions wrt a loss function [BM12; BM15b]

12 Exact Learning of WFA with Queries Setup: Unknown f : Σ Ñ R with rankpf q n Membership oracle: MQ f pxq returns f pxq for any x P Σ Equivalence oracle: EQ f paq returns true if f A and pfalse, zq if f pzq Apzq Algorithm: 1. Initialize P S tɛu and maintain B pp, Sq 2. Let A SpectralpH B, H B Σ, rankphb qq 3. While EQpAq pfalse, zq 3.1 Let z p a s with p the longest prefix of z in P 3.2 Let S S Y suxespsq 3.3 While Dp P P and Da P Σ such that H B a pp, q R rowspanph B q, add p a to P 3.4 Let A SpectralpH B, H B Σ, rankphb qq Analysis: At most n ` 1 calls to EQ f and Op Σ n 2 Lq calls to MQ f, where L max z Can be improved to Opp Σ ` log Lqn 2 q calls to MQ f ; can reduce calls to EQ f by increasing calls to MQ f

13 PAC Learning Stochastic WFA Setup: Unknown f : Σ Ñ R with rankpf q n defining probability distribution on Σ Data: x p1q,..., x pmq i.i.d. strings sampled from f Parameters: n and B pp, Sq such that rankph B q n and ɛ P P X S Algorithm: 1. Estimate Hankel matrices Ĥ B and Ĥ B a for all a P Σ using empirical probabilities 2. Return  SpectralpĤB, ĤB Σ, nq Analysis: ˆf pxq 1 m mÿ 1rx piq xs Running time is Op P S m ` Σ P S nq With high probability ř x ďl f pxq Âpxq O L2 Σ? n σ n ph B f q 2? m i 1

14 Statistical Learning of WFA Setup: Unknown distribution D over Σ ˆ R Data: px p1q, y p1q q,..., px pmq, y pmq q i.i.d. string-label pairs sampled from D Parameters: n, convex loss function l : R ˆ R Ñ R`, convex regularizer R, regularization parameter λ ą 0, and B pp, Sq with ɛ P P X S Algorithm: 1. Build B 1 pp 1, Sq with P 1 P Y P Σ 2. Find the Hankel matrix Ĥ B1 1 ř solving min m H m i 1 lphpx piq q, y piq q ` λrphq 3. Return  SpectralpĤB, ĤB Σ, nq, where ĤB and ĤB Σ are submatrices of ĤB1 Analysis: Running time is polynomial in n, m, Σ, P, and S With high probability E px,yq D rlpâpxq, yqs ď 1 mÿ lpâpx piq q, y piq q ` O m i 1 ˆ 1? m

15 Outline 1. From Hankel Matrices to Weighted Automata 2. From Data to Hankel Matrices 3. From Theory to Practice

16 Extensions 1. More complex models Transducers and taggers [BQC11; Qua+14] Grammars and tree automata [Luq+12; Bal+14; RBC16] Reactive models [BBP15; LBP16; BM17a] 2. More realistic setups Multiple related tasks [RBP17] Timing data [BBP15; LBP16] Single trajectory [BM17a] Probabilistic models [BHP14] 3. Deeper theory Convex relaxations [BQC12] Generalization bounds [BM15b; BM17b] Approximate minimisation [BPP15] Bisimulation metrics [BGP17]

17 And It Works Too! Spectral methods are competitive against traditional methods: Expectation maximization Conditional random fields Tensor decompositions L1 distance HMM k HMM FST # training samples (in thousands) Hamming Accuracy (test) K 2K 5K 10K 15K Training Samples No Regularization Avg. Perceptron CRF Spectral IO HMM L2 Max Margin Spectral Max Margin In a variety of problems: Sequence tagging Constituency and dependency parsing Timing and geometry learning POS-level language modelling Runtime [log(sec)] Initialization Model Building 0 Spec-Str Spec-Sub CO Tensor EM length Rel. error mu qn SVTA SVTA* True ODM Hankel rank length mu qn SVTA SVTA* Word Error Rate (%) all sentences mu qn SVTA SVTA* Number of States Spectral, Σ basis Spectral, basis k=25 Spectral, basis k=50 Spectral, basis k=100 Spectral, basis k=300 Spectral, basis k=500 Unigram Bigram

18 Open Problems and Current Trends Optimal selection of P and S from data Scalable convex optimization over sets of Hankel matrices Constraining the output WFA (eg. probabilistic automata) Relations between learning and approximate minimisation How much of this can be extended to WFA over semi-rings? Spectral methods for initializing non-convex gradient-based learning algorithms

19 Conclusion Take home points A single building block based on SVD of Hankel matrices Implementation only requires linear algebra Analysis involves linear algebra, probability, convex optimization Can be made practical for a variety of models and applications Want to know more? EMNLP 14 tutorial (with slides, video, and code) Survey papers [BM15a; TJ15] Python toolkit Sp2Learn [Arr+16] Neighbouring literature: Predictive state representations (PSR) [LSS02] and Observable operator models (OOM) [Jae00]

20 Thanks To All My Collaborators! Guillaume Rabusseau Franco M. Luque Xavier Carreras Mehryar Mohri Prakash Panangaden Pierre-Luc Bacon Pascale Gourdeau Odalric-Ambrym Maillard Will Hamilton Lucas Langer Shay Cohen Amir Globerson Joelle Pineau Doina Precup Ariadna Quattoni

21 Bibliography I [Arr+16] [Bal+14] [Bal13] [BBM06] [BBP15] [BDR09] D. Arrivault, D. Benielli, F. Denis, and R. Eyraud. Sp2Learn: A Toolbox for the Spectral Learning of Weighted Automata. In: ICGI B. Balle, X. Carreras, F.M. Luque, and A. Quattoni. Spectral learning of weighted automata: A forward-backward perspective. In: Machine Learning (2014). B. Balle. Learning Finite-State Machines: Algorithmic and Statistical Aspects. PhD thesis. Universitat Politècnica de Catalunya, L. Bisht, N. H. Bshouty, and H. Mazzawi. On Optimal Learning Algorithms for Multiplicity Automata. In: COLT P.-L. Bacon, B. Balle, and D. Precup. Learning and Planning with Timing Information in Markov Decision Processes. In: UAI R. Bailly, F. Denis, and L. Ralaivola. Grammatical inference as a principal component analysis problem. In: ICML

22 Bibliography II [BGP17] [BHP14] [BM12] [BM15a] [BM15b] [BM17a] B. Balle, P. Gourdeau, and P. Panangaden. Bisimulation Metrics for Weighted Automata. In: ICALP B. Balle, W. L. Hamilton, and J. Pineau. Methods of Moments for Learning Stochastic Languages: Unified Presentation and Empirical Comparison. In: ICML B. Balle and M. Mohri. Spectral learning of general weighted automata via constrained matrix completion. In: NIPS B. Balle and M. Mohri. Learning Weighted Automata (invited paper). In: CAI B. Balle and M. Mohri. On the Rademacher complexity of weighted automata. In: ALT B. Balle and O.-A. Maillard. Spectral Learning from a Single Trajectory under Finite-State Policies. In: ICML

23 Bibliography III [BM17b] [BPP15] [BQC11] [BQC12] [BV96] [Fli74] [HKZ09] B. Balle and M. Mohri. Generalization Bounds for Learning Weighted Automata. In: Theor. Comput. Sci. (to appear) (2017). B. Balle, P. Panangaden, and D. Precup. A Canonical Form for Weighted Automata and Applications to Approximate Minimization. In: LICS B. Balle, A. Quattoni, and X. Carreras. A spectral learning algorithm for finite state transducers. In: ECML-PKDD B. Balle, A. Quattoni, and X. Carreras. Local loss optimization in operator models: A new insight into spectral learning. In: ICML F. Bergadano and S. Varricchio. Learning behaviors of automata from multiplicity and equivalence queries. In: SIAM Journal on Computing (1996). M. Fliess. Matrices de Hankel. In: Journal de Mathématiques Pures et Appliquées (1974). D. Hsu, S. M. Kakade, and T. Zhang. A spectral algorithm for learning hidden Markov models. In: COLT

24 Bibliography IV [Jae00] [LBP16] [LSS02] [Luq+12] [Qua+14] [RBC16] [RBP17] H. Jaeger. Observable operator models for discrete stochastic time series. In: Neural Computation (2000). L. Langer, B. Balle, and D. Precup. Learning Multi-Step Predictive State Representations. In: IJCAI M. Littman, R. S. Sutton, and S. Singh. Predictive representations of state. In: NIPS F.M. Luque, A. Quattoni, B. Balle, and X. Carreras. Spectral learning in non-deterministic dependency parsing. In: EACL A. Quattoni, B. Balle, X. Carreras, and A. Globerson. Spectral Regularization for Max-Margin Sequence Tagging. In: ICML G. Rabusseau, B. Balle, and S. B. Cohen. Low-Rank Approximation of Weighted Tree Automata. In: AISTATS G. Rabusseau, B. Balle, and J. Pineau. Multitask Spectral Learning of Weighted Automata. In: NIPS

25 Bibliography V [TJ15] M. R. Thon and H. Jaeger. Links between multiplicity automata, observable operator models and predictive state representations: a unified learning framework. In: Journal of Machine Learning Research (2015).

26 Learning Automata with Hankel Matrices Borja Balle Amazon Research Cambridge Highlights London, September 2017

Theoretical Guarantees for Learning Weighted Automata

Theoretical Guarantees for Learning Weighted Automata Theoretical Guarantees for Learning Weighted Automata Borja Balle ICGI Keynote October 2016 Thanks To My Collaborators! Mehryar Mohri Ariadna Quattoni Xavier Carreras Prakash Panangaden Doina Precup Outline

More information

Automata Learning. Borja Balle. Foundations of Programming Summer School (Oxford) July Amazon Research Cambridge 1

Automata Learning. Borja Balle. Foundations of Programming Summer School (Oxford) July Amazon Research Cambridge 1 Automata Learning Borja Balle Amazon Research Cambridge 1 Foundations of Programming Summer School (Oxford) July 2018 1 Based on work completed before joining Amazon Brief History of Automata Learning

More information

Spectral Learning for Non-Deterministic Dependency Parsing

Spectral Learning for Non-Deterministic Dependency Parsing Spectral Learning for Non-Deterministic Dependency Parsing Franco M. Luque 1 Ariadna Quattoni 2 Borja Balle 2 Xavier Carreras 2 1 Universidad Nacional de Córdoba 2 Universitat Politècnica de Catalunya

More information

Singular Value Automata and Approximate Minimization

Singular Value Automata and Approximate Minimization Singular Value Automata and Approximate Minimization Borja Balle Amazon Research Cambridge 1 Weighted Automata: Theory and Applications May 2018 1 Based on work completed before joining Amazon What Is

More information

Learning Multi-Step Predictive State Representations

Learning Multi-Step Predictive State Representations Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Learning Multi-Step Predictive State Representations Lucas Langer McGill University Canada Borja Balle

More information

Spectral Learning of Weighted Automata

Spectral Learning of Weighted Automata Noname manuscript No. (will be inserted by the editor) Spectral Learning of Weighted Automata A Forward-Backward Perspective Borja Balle Xavier Carreras Franco M. Luque Ariadna Quattoni Received: date

More information

Spectral Learning of Weighted Automata

Spectral Learning of Weighted Automata Spectral Learning of Weighted Automata A Forward-Backward Perspective Borja Balle 1, Xavier Carreras 1, Franco M. Luque 2, and Ariadna Quattoni 1 1 Universitat Politècnica de Catalunya, Barcelona, Spain

More information

Methods of Moments for Learning Stochastic Languages: Unified Presentation and Empirical Comparison

Methods of Moments for Learning Stochastic Languages: Unified Presentation and Empirical Comparison Methods of Moments for Learning Stochastic Languages: Unified Presentation and Empirical Comparison Borja Balle 1 BBALLE@CS.MCGILL.CA William L Hamilton 1 WHAMIL3@CS.MCGILL.CA Joelle Pineau JPINEAU@CS.MCGILL.CA

More information

Spectral Learning Techniques for Weighted Automata, Transducers, and Grammars

Spectral Learning Techniques for Weighted Automata, Transducers, and Grammars Spectral Learning Techniques for Weighted Automata, Transducers, and Grammars Borja Balle Ariadna Quattoni p q McGill University Xavier Carreras p q Xerox Research Centre Europe TUTORIAL @ EMNLP 2014 Latest

More information

Predictive Timing Models

Predictive Timing Models Predictive Timing Models Pierre-Luc Bacon McGill University pbacon@cs.mcgill.ca Borja Balle McGill University bballe@cs.mcgill.ca Doina Precup McGill University dprecup@cs.mcgill.ca Abstract We consider

More information

Spectral Learning Techniques for Weighted Automata, Transducers, and Grammars

Spectral Learning Techniques for Weighted Automata, Transducers, and Grammars Spectral Learning Techniques for Weighted Automata, Transducers, and Grammars Borja Balle Ariadna Quattoni p q McGill University Xavier Carreras p q Xerox Research Centre Europe TUTORIAL @ EMNLP 2014 Status

More information

Spectral learning of weighted automata

Spectral learning of weighted automata Mach Learn (2014) 96:33 63 DOI 10.1007/s10994-013-5416-x Spectral learning of weighted automata A forward-backward perspective Borja Balle Xavier Carreras Franco M. Luque Ariadna Quattoni Received: 8 December

More information

Scikit-SpLearn : a toolbox for the spectral learning of weighted automata compatible with scikit-learn

Scikit-SpLearn : a toolbox for the spectral learning of weighted automata compatible with scikit-learn Scikit-SpLearn : a toolbox for the spectral learning of weighted automata compatible with scikit-learn Denis Arrivault, Dominique Benielli, François Denis, Rémi Eyraud To cite this version: Denis Arrivault,

More information

Learning Weighted Automata

Learning Weighted Automata Learning Weighted Automata Borja Balle 1 and Mehryar Mohri 2,3 1 School of Computer Science, McGill University, Montréal, Canada 2 Courant Institute of Mathematical Sciences, New York, NY 3 Google Research,

More information

Learning Weighted Automata

Learning Weighted Automata Learning Weighted Automata Joint work with Borja Balle (Amazon Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Weighted Automata (WFAs) page 2 Motivation Weighted automata (WFAs): image

More information

Spectral Learning of Predictive State Representations with Insufficient Statistics

Spectral Learning of Predictive State Representations with Insufficient Statistics Spectral Learning of Predictive State Representations with Insufficient Statistics Alex Kulesza and Nan Jiang and Satinder Singh Computer Science & Engineering University of Michigan Ann Arbor, MI, USA

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets

On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets Pablo Samuel Castro pcastr@cs.mcgill.ca McGill University Joint work with: Doina Precup and Prakash

More information

PAC-Learning of Markov Models with Hidden State

PAC-Learning of Markov Models with Hidden State PAC-Learning of Markov Models with Hidden State Ricard Gavaldà 1, Philipp W. Keller 2, Joelle Pineau 2, Doina Precup 2 1 Universitat Politècnica de Catalunya, Barcelona, Spain 2 McGill University, School

More information

Spectral Probabilistic Modeling and Applications to Natural Language Processing

Spectral Probabilistic Modeling and Applications to Natural Language Processing School of Computer Science Spectral Probabilistic Modeling and Applications to Natural Language Processing Doctoral Thesis Proposal Ankur Parikh Thesis Committee: Eric Xing, Geoff Gordon, Noah Smith, Le

More information

Learning probability distributions generated by finite-state machines

Learning probability distributions generated by finite-state machines Learning probability distributions generated by finite-state machines Jorge Castro Ricard Gavaldà LARCA Research Group Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya,

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

Spectral Learning of Refinement HMMs

Spectral Learning of Refinement HMMs Spectral Learning of Refinement HMMs Karl Stratos 1 Alexander M Rush 2 Shay B Cohen 1 Michael Collins 1 1 Department of Computer Science, Columbia University, New-York, NY 10027, USA 2 MIT CSAIL, Cambridge,

More information

Using Regression for Spectral Estimation of HMMs

Using Regression for Spectral Estimation of HMMs Using Regression for Spectral Estimation of HMMs Abstract. Hidden Markov Models (HMMs) are widely used to model discrete time series data, but the EM and Gibbs sampling methods used to estimate them are

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Learning and Planning with Timing Information in Markov Decision Processes

Learning and Planning with Timing Information in Markov Decision Processes Learning and Planning with Timing Information in Markov Decision Processes Pierre-Luc Bacon, Borja Balle, and Doina Precup Reasoning and Learning Lab, School of Computer Science, McGill University {pbacon,bballe,dprecup}@cs.mcgill.ca

More information

PAC-Learning of Markov Models with Hidden State

PAC-Learning of Markov Models with Hidden State PAC-Learning of Markov Models with Hidden State Ricard GAVALDA, Philipp W. KELLER, Joelle PINEAU, Doina PRECUP McGill University, School of Computer Science, 3480 University St., Montreal, QC, H3A2A7 Abstract.

More information

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project

More information

Spectral Learning of Sequential Systems

Spectral Learning of Sequential Systems Spectral Learning of Sequential Systems by Michael Thon A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Approved Dissertation Committee:

More information

Planning in POMDPs Using Multiplicity Automata

Planning in POMDPs Using Multiplicity Automata Planning in POMDPs Using Multiplicity Automata Eyal Even-Dar School of Computer Science Tel-Aviv University Tel-Aviv, Israel 69978 evend@post.tau.ac.il Sham M. Kakade Computer and Information Science University

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Anton Rodomanov Higher School of Economics, Russia Bayesian methods research group (http://bayesgroup.ru) 14 March

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Duality in Probabilistic Automata

Duality in Probabilistic Automata Duality in Probabilistic Automata Chris Hundt Prakash Panangaden Joelle Pineau Doina Precup Gavin Seal McGill University MFPS May 2006 Genoa p.1/40 Overview We have discovered an - apparently - new kind

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Algorithms for Predicting Structured Data

Algorithms for Predicting Structured Data 1 / 70 Algorithms for Predicting Structured Data Thomas Gärtner / Shankar Vembu Fraunhofer IAIS / UIUC ECML PKDD 2010 Structured Prediction 2 / 70 Predicting multiple outputs with complex internal structure

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Spectral Dependency Parsing with Latent Variables

Spectral Dependency Parsing with Latent Variables Spectral Dependency Parsing with Latent Variables Paramveer S. Dhillon 1, Jordan Rodu 2, Michael Collins 3, Dean P. Foster 2 and Lyle H. Ungar 1 1 Computer & Information Science/ 2 Statistics, University

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Statistical Processing of Natural Language

Statistical Processing of Natural Language Statistical Processing of Natural Language and DMKM - Universitat Politècnica de Catalunya and 1 2 and 3 1. Observation Probability 2. Best State Sequence 3. Parameter Estimation 4 Graphical and Generative

More information

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013 More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative

More information

Graphical models for part of speech tagging

Graphical models for part of speech tagging Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional

More information

Reduced-Rank Hidden Markov Models

Reduced-Rank Hidden Markov Models Reduced-Rank Hidden Markov Models Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University ... x 1 x 2 x 3 x τ y 1 y 2 y 3 y τ Sequence of observations: Y =[y 1 y 2 y 3... y τ ] Assume

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework

More information

Spectral Regularization for Max-Margin Sequence Tagging

Spectral Regularization for Max-Margin Sequence Tagging Spectrl Regulriztion for Mx-Mrgin Sequence Tgging Aridn Quttoni Borj Blle Xvier Crrers Amir Globerson pq Universitt Politècnic de Ctluny now t p q McGill University p q The Hebrew University of Jeruslem

More information

Using Multiplicity Automata to Identify Transducer Relations from Membership and Equivalence Queries

Using Multiplicity Automata to Identify Transducer Relations from Membership and Equivalence Queries Using Multiplicity Automata to Identify Transducer Relations from Membership and Equivalence Queries Jose Oncina Dept. Lenguajes y Sistemas Informáticos - Universidad de Alicante oncina@dlsi.ua.es September

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Computational Complexity of Bayesian Networks

Computational Complexity of Bayesian Networks Computational Complexity of Bayesian Networks UAI, 2015 Complexity theory Many computations on Bayesian networks are NP-hard Meaning (no more, no less) that we cannot hope for poly time algorithms that

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Spectral Unsupervised Parsing with Additive Tree Metrics

Spectral Unsupervised Parsing with Additive Tree Metrics Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

Structured Prediction Models via the Matrix-Tree Theorem

Structured Prediction Models via the Matrix-Tree Theorem Structured Prediction Models via the Matrix-Tree Theorem Terry Koo Amir Globerson Xavier Carreras Michael Collins maestro@csail.mit.edu gamir@csail.mit.edu carreras@csail.mit.edu mcollins@csail.mit.edu

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence

More information

A Practical Algorithm for Topic Modeling with Provable Guarantees

A Practical Algorithm for Topic Modeling with Provable Guarantees 1 A Practical Algorithm for Topic Modeling with Provable Guarantees Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu Reviewed by Zhao Song December

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

Fast Inference and Learning with Sparse Belief Propagation

Fast Inference and Learning with Sparse Belief Propagation Fast Inference and Learning with Sparse Belief Propagation Chris Pal, Charles Sutton and Andrew McCallum University of Massachusetts Department of Computer Science Amherst, MA 01003 {pal,casutton,mccallum}@cs.umass.edu

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

Conditional Random Fields: An Introduction

Conditional Random Fields: An Introduction University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 2-24-2004 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania

More information

Text Mining. March 3, March 3, / 49

Text Mining. March 3, March 3, / 49 Text Mining March 3, 2017 March 3, 2017 1 / 49 Outline Language Identification Tokenisation Part-Of-Speech (POS) tagging Hidden Markov Models - Sequential Taggers Viterbi Algorithm March 3, 2017 2 / 49

More information

State Space Abstraction for Reinforcement Learning

State Space Abstraction for Reinforcement Learning State Space Abstraction for Reinforcement Learning Rowan McAllister & Thang Bui MLG, Cambridge Nov 6th, 24 / 26 Rowan s introduction 2 / 26 Types of abstraction [LWL6] Abstractions are partitioned based

More information

A Lower Bound for Learning Distributions Generated by Probabilistic Automata

A Lower Bound for Learning Distributions Generated by Probabilistic Automata A Lower Bound for Learning Distributions Generated by Probabilistic Automata Borja Balle, Jorge Castro, and Ricard Gavaldà Universitat Politècnica de Catalunya, Barcelona {bballe,castro,gavalda}@lsi.upc.edu

More information

On-Line Learning with Path Experts and Non-Additive Losses

On-Line Learning with Path Experts and Non-Additive Losses On-Line Learning with Path Experts and Non-Additive Losses Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Courant Institute) Manfred Warmuth (UC Santa Cruz) MEHRYAR MOHRI MOHRI@ COURANT

More information

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University NIPS 2007 poster W22 steam Application: Dynamic Textures

More information

Learning from Sequential and Time-Series Data

Learning from Sequential and Time-Series Data Learning from Sequential and Time-Series Data Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/? Sequential and Time-Series Data Many real-world applications

More information

Learning k-edge Deterministic Finite Automata in the Framework of Active Learning

Learning k-edge Deterministic Finite Automata in the Framework of Active Learning Learning k-edge Deterministic Finite Automata in the Framework of Active Learning Anuchit Jitpattanakul* Department of Mathematics, Faculty of Applied Science, King Mong s University of Technology North

More information

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace A.I. in health informatics lecture 8 structured learning kevin small & byron wallace today models for structured learning: HMMs and CRFs structured learning is particularly useful in biomedical applications:

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

Learning Finite-State Machines: Statistical and Algorithmic Aspects

Learning Finite-State Machines: Statistical and Algorithmic Aspects Learning Finite-State Machines: Statistical and Algorithmic Aspects Borja de Balle Pigem Supervisors: Jorge Castro and Ricard Gavaldà Department of Software PhD in Computing Thesis submitted to obtain

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

A Nonlinear Predictive State Representation

A Nonlinear Predictive State Representation Draft: Please do not distribute A Nonlinear Predictive State Representation Matthew R. Rudary and Satinder Singh Computer Science and Engineering University of Michigan Ann Arbor, MI 48109 {mrudary,baveja}@umich.edu

More information

PAC-Bayesian Generalization Bound for Multi-class Learning

PAC-Bayesian Generalization Bound for Multi-class Learning PAC-Bayesian Generalization Bound for Multi-class Learning Loubna BENABBOU Department of Industrial Engineering Ecole Mohammadia d Ingènieurs Mohammed V University in Rabat, Morocco Benabbou@emi.ac.ma

More information

MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING

MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING Outline Some Sample NLP Task [Noah Smith] Structured Prediction For NLP Structured Prediction Methods Conditional Random Fields Structured Perceptron Discussion

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations

More information

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features

More information

Machine Learning with Quantum-Inspired Tensor Networks

Machine Learning with Quantum-Inspired Tensor Networks Machine Learning with Quantum-Inspired Tensor Networks E.M. Stoudenmire and David J. Schwab Advances in Neural Information Processing 29 arxiv:1605.05775 RIKEN AICS - Mar 2017 Collaboration with David

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

Variational Functions of Gram Matrices: Convex Analysis and Applications

Variational Functions of Gram Matrices: Convex Analysis and Applications Variational Functions of Gram Matrices: Convex Analysis and Applications Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Lin Xiao (Microsoft Research) IMA Workshop on Resource

More information

Spectral Learning for Non-Deterministic Dependency Parsing

Spectral Learning for Non-Deterministic Dependency Parsing Spectral Learning for Non-Deterministic Dependency Parsing Franco M. Luque Ariadna Quattoni and Borja Balle and Xavier Carreras Universidad Nacional de Córdoba Universitat Politècnica de Catalunya and

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

CRF for human beings

CRF for human beings CRF for human beings Arne Skjærholt LNS seminar CRF for human beings LNS seminar 1 / 29 Let G = (V, E) be a graph such that Y = (Y v ) v V, so that Y is indexed by the vertices of G. Then (X, Y) is a conditional

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Collapsed Variational Bayesian Inference for Hidden Markov Models

Collapsed Variational Bayesian Inference for Hidden Markov Models Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie

More information

Supervised Metric Learning with Generalization Guarantees

Supervised Metric Learning with Generalization Guarantees Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina

More information