Language Learning Problems in the Principles and Parameters Framework

Similar documents
Language Acquisition and Parameters: Part II

Models of Language Evolution

Models of Language Evolution: Part II

Models of Language Acquisition: Part II

Learning Context Free Grammars with the Syntactic Concept Lattice

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

Computational Learning Theory

Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron

Bringing machine learning & compositional semantics together: central concepts

Functional Analysis Review

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Introduction to Machine Learning

Computational Learning Theory (COLT)

Name (NetID): (1 Point)

CS446: Machine Learning Spring Problem Set 4

Markov Chains Absorption Hamid R. Rabiee

On rigid NL Lambek grammars inference from generalized functor-argument data

Shannon s Noisy-Channel Coding Theorem

Markov Chains Absorption (cont d) Hamid R. Rabiee

CS 514, Mathematics for Computer Science Mid-semester Exam, Autumn 2017 Department of Computer Science and Engineering IIT Guwahati

The Power of Random Counterexamples

Chapter 7 R&N ICS 271 Fall 2017 Kalev Kask

Economics 204 Fall 2011 Problem Set 1 Suggested Solutions

A Universal Turing Machine

Non-context-Free Languages. CS215, Lecture 5 c

Quasi-Second-Order Parsing for 1-Endpoint-Crossing, Pagenumber-2 Graphs

Formal Languages 2: Group Theory

Computational and Statistical Learning Theory

CS 275 Automata and Formal Language Theory. Proof of Lemma II Lemma (II )

CS 125 Section #10 (Un)decidability and Probability November 1, 2016

Introduction to Machine Learning (67577) Lecture 5

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

Computational Learning Theory. Definitions

Introduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric

Lecture 15. Probabilistic Models on Graph

Computational Learning Theory

Computational Learning Theory. CS534 - Machine Learning

Decidability: Church-Turing Thesis

A Result of Vapnik with Applications

A non-turing-recognizable language

Machine Learning

1. Propositional Calculus

STOCHASTIC PROCESSES Basic notions

Online Learning, Mistake Bounds, Perceptron Algorithm

Chapter 2. Some basic tools. 2.1 Time series: Theory Stochastic processes

The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning

Littlestone s Dimension and Online Learnability

UNIT II REGULAR LANGUAGES

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

6.867 Machine learning: lecture 2. Tommi S. Jaakkola MIT CSAIL

From Batch to Transductive Online Learning

Probabilistic Linguistics

Lecture 29: Computational Learning Theory

Introduction to Automata

CMPSCI 601: Tarski s Truth Definition Lecture 15. where

Turing Machine Recap

Computational Learning Theory

Machine Learning

Midterm, Fall 2003

CS 6375: Machine Learning Computational Learning Theory

Computational Learning Theory

The sample complexity of agnostic learning with deterministic labels

Handout: Proof of the completeness theorem

Arithmetic Complexity in Learning Theory

Hierarchical Concept Learning

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Computational Learning Theory (VC Dimension)

Context-Free Grammars and Languages. Reading: Chapter 5

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Advanced Natural Language Processing Syntactic Parsing

Logical Agents. Outline

Sharpening the empirical claims of generative syntax through formalization

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

Parallel Learning of Automatic Classes of Languages

Intelligent Agents. Pınar Yolum Utrecht University

Artificial Intelligence

10.1 The Formal Model

Computational Models - Lecture 4 1

Learning via Finitely Many Queries

Theory of Computation

CSE 105 THEORY OF COMPUTATION

Computational Learning Theory

6.842 Randomness and Computation Lecture 5

On the non-existence of maximal inference degrees for language identification

Learning cover context-free grammars from structural data

FORMULATION OF THE LEARNING PROBLEM

Introduction to Machine Learning (67577) Lecture 3

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

Harvard CS 121 and CSCI E-207 Lecture 6: Regular Languages and Countability

Discrete Mathematics. CS204: Spring, Jong C. Park Computer Science Department KAIST

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

Inductive Inference Systems for Learning Classes of Algorithmically Generated Sets and Structures Valentina S. Harizanov

Noisy-Channel Coding

On the Concept of Enumerability Of Infinite Sets Charlie Obimbo Dept. of Computing and Information Science University of Guelph

Generalization bounds

Shortest paths with negative lengths

Computational Learning Theory for Artificial Neural Networks

Transcription:

Language Learning Problems in the Principles and Parameters Framework Partha Niyogi Presented by Chunchuan Lyu March 22, 2016 Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 1 / 18

Overview 1 The Principles and Parameters Framework space of learning problems X-bar theory Triggering Learning Algorithm 2 Formal Analysis of the Triggering Learning Algorithm parameters space learning wih TLA as a Markov chain learnability theorem rate of convergence 3 Discussion summary continued works and open questions Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 2 / 18

The Principles and Parameters Framework: Problem of Language Learning Question: under what condition is language learning possible? Goal: put linguistic problem into formal analysis Given data distribution P, use an algorithm A to identify target grammar G t within Hypothesis space H. Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 3 / 18

The Principles and Parameters Framework: Space of learning problems Figure: five important dimensions of every learning problem Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 4 / 18

The Principles and Parameters Framework: X-bar theory Production Rules: XP SpecX (p 1 = 0) or X Spec(p 1 = 1) X CompX (p 2 = 0) or X Comp(p 2 = 1) X X Spec/Comp XP or ɛ X: lexical type e.g. Noun,Verb,Adjective Spec/Comp : specifier/complement p 3 : V2 parameter (binary) control transformation from underlining structure to surface form 3 binary parameters in total, G t could be parameterized as [1 0 1] Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 5 / 18

The Principles and Parameters Framework: X-bar theory Figure: spec-first, and comp-final Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 6 / 18

The Principles and Parameters Framework: Triggering Learning Algorithm 1 [Initialize] Start at random initial point in space of possible parameter settings, specifying a single hypothesized grammar G t with its resulting extension as a language L(G t ); 2 [Process input sentence] Receive a positive example sentence s i at time t i (drawn from distribution over L(G t ) ); 3 [Learnability on error detection] If the current grammar parses s i, then go to Step 2; otherwise, continue; 4 [Single-step hill climbing] Select a single parameter uniformly at random, to flip from its current setting, and change it iff that change allows the current sentence to be analyzed Remarks: Only change one parameter at one step Only change parameter when it parses Memoryless Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 7 / 18

Formal Analysis of the Triggering Learning Algorithm: parameter space learning wih TLA as a Markov chain N binary parameters: space H of grammars with 2 N points Transitions between states differ only by one parameter p ij defines probability of transition from state i to state j Probabilities p ij are determined by a probability distribution P over L(G t ), and the learning algorithm A < A, P, G t, H > characterize a Markov chain. Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 8 / 18

Formal Analysis of the Triggering Learning Algorithm: parameter space learning wih TLA as a Markov chain Definition (Gold-learnability) G t is Gold learnable by A for distribution P if and only if A identifies G t in the limit of n with probability 1. Definition (Absorbing state) Given a Markov chain M, and absorbing state of M is a state s M that has no exit arcs to any others states of M. Definition (Closed set of states) A closed set of states (C) is any proper subset of states in M such that there is no transition probability from any of the states in C to any state not in C. Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 9 / 18

Formal Analysis of the Triggering Learning Algorithm: learnability theorem Theorem Let < A, P, G t, H > be a memoryless learning system. Let M be the Markov chain associated with this learning system. Then, G t is Gold learnable by A for distribution P if and only if every closed set in M includes the target state corresponding to G t. Interpretation: [only if] Gold-learnability requires algorithm makes no mistake from which it cannot recover. [if] as long as it won t make such mistake, Gold-learnability can be assured. Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 10 / 18

Formal Analysis of the Triggering Learning Algorithm: learnability theorem Theorem (learnability theorem) Let < A, P, G t, H > be a memoryless learning system. Let M be the Markov chain associated with this learning system. Then, G t is Gold learnable by A for distribution P if and only if every closed set in M includes the target state corresponding to G t. Key ideas in proof (only for expert): [only if] Proof by contradiction. [if] By construction, M can be decomposed into transient states and the only closed set of state, which only contains target state. As number of sentences turns to infinity, the probability of starting from any transient state to all other transient states turns to 0. This left the limiting probability of identify G f to be 1. Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 11 / 18

Formal Analysis of the Triggering Learning Algorithm: Rate of convergence How fast is the convergence? Computing transition probability: 1 [Assign distribution] Fix a probability measure P on the sentences of target language L(G t ) (short hand as L t ) ; 2 [Enumerate states] Assign a state to each language i.e., each L i ; 3 [Normalize by the target language] L i = L i L t ; 4 [Take set differences] For any two states i and j, where i j, if their corresponding grammar has more than one difference in parameter p ij = 0. Otherwise, p ij = 1 N P(L j \L i ), where N is number of parameters. For i = j, we have p ii = 1 i k p ik. Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 12 / 18

Formal Analysis of the Triggering Learning Algorithm: rate of convergence Represent transition matrix P with eigenvalue decomposition: P k = m λ k i y i x i i=1 Let Π 0 represents probability over initial states, which could be uniform. Let Π k = (π 1 (k),, π m (k)) be the probability distribution over states at time k, and Π = Π k = Π 0 P. k λ 1 = 1 correspond to target state, which is an absorbing state. Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 13 / 18

Theorem (convergence theorem) m m Π k Π = λ k i Π 0 y i x i max { λ i k } Π 0 y i x i 2 i m Interpretation: i=2 If there are more than one λ i = 1, the target grammar is not Gold-learnable. This is to the presence of multiple absorbing states, which implies existence of closed set of states that does not contain the target state. Otherwise, the second largest eigenvalue of transition matrix determines the speed of convergence. i=2 Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 14 / 18

Summary Turning linguistics problem into formal analysis allows making precise statement Learning in parameterized theory five elements/use the picture 1 parameterization finite 2 distribution of data independent 3 noise noiseless 4 algorithm TLA 5 memory memoryless Learning in such framework can be framed as a Markov chain 1 Gold-learnable iff target state is included in every closed set of states 2 Rate of convergence can be computed from transition matrix Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 15 / 18

Extended works Alternating the five elements. e.g. changing TLA to random walk ensures learnability. More formalism, introduce metric between grammar, probably approximately correct, VC and so on. Learning over generation can be modeled as population dynamics, thus a framework for language evolution. The origin of language, why language work as it is. Treat grammar as genre, and communication efficiency as selection criteria. Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 16 / 18

Open questions How to handle ambiguity in parsing. L does not seem to be identifiable with G. Countable and Uncountable parameterization. The first theorem requires finiteness. Interactions between teacher distribution and learner. Q & A? Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 17 / 18

References Partha Niyogi. The computational nature of language learning and evolution. MIT press Cambridge, MA:, 2006. Partha Niyogi. The informational complexity of learning: perspectives on neural networks and generative grammar. Springer Science & Business Media, 2012. Matilde Marcolli. Cs101: Mathematical and computational linguistics, 2015. Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 18 / 18