Models of Language Acquisition: Part II

Size: px
Start display at page:

Download "Models of Language Acquisition: Part II"

Transcription

1 Models of Language Acquisition: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015

2 Probably Approximately Correct Model of Language Learning General setting of Statistical Learning Theory: objects of learning are functions concept class: set F of possible target functions hypothesis class set H of functions f : X Y typically assume F = H for language case: X = set A of all possible strings on an alphabet, Y = {0, 1} given a language L A consider the associated indicator function (characteristic function) χ L : A {0, 1} { 1 x L χ L (x) = 0 x / L

3 Language L can be seen as 1 recursively enumerable subset L A 2 Turing machine (program) that recognizes L 3 indicator function χ L distance function on the space of languages (something better than the discrete 0/1 metric): L 1 (P)-distance use a probability measure P on A (Bernoulli, Markov,...) define L 1 (P)-distance as d P (L, L ) = ɛ-neighborhood of a language L s A χ L (s) χ L (s) P(s) N ɛ (L) = {L d P (L, L ) < ɛ}

4 Examples are randomly presented to a learner according to the probability distribution P Note: both positive and negative examples view examples as pairs (x, y) with x A and y = χ L (x) Data set: D = k D k D k = {(z 1,..., z k ) z i = (x i, y i ), x i A, y i {0, 1}} learning algorithm A : D H after k data points learner conjectures a function ĥk H procedure that minimizes empirical risk: ĥ k (x j ) = arg min h H 1 k k y j h(x j ) j=1

5 ĥ k = A(δ k ) with δ k D k a random element successful learning: hypothesis ĥ k converges to target χ L as k hypothesis ĥ k is a random function (because δ k D k random) so need convergence in probabilistic sense ( ) lim P d P (ĥ k, χ L ) > ɛ = 0 k this means weak convergence of random variables ĥ k = A(δ k ) w χ L ( ) E P ĥ k (s) χ L (s) = ĥk(s) χ L (s) P(s) 0 s A Note double role of probability P: in defining L 1 (P)-distance for convergence; and also in drawing random data δ k D k

6 target function is L (t) = arg min L E P ( χ L (t) χ L ) a set of elements S = {s 1,..., s n } in A is shattered by the set of functions H if, for every set of binary vectors b = (b 1,..., b n ) there is a function h b H such that h b (x i ) = 1 iff b i = 1 this means that for every way of partitioning the set S into two parts, there is a function in H that implements the partition (H must have at least 2 n elements) Vapnik Chervonenkis dimension of H is D if there is at least one set of D elements that is shattered by H and no set of D + 1 elements is (if no such D then dim VC H = )

7 Learnability Fact: Set H of languages (identified with indicator functions χ L ): languages L of H are learnable iff Vapnik Chervonenkis dimension dim VC H < here learnability as weak convergence ĥ k = A(δ k ) w χ L ĥ k (x j ) = χ ˆL k (x j ) = arg min L empirical risk minimization 1 k k y j χ L (x j ) why finite Vapnik Chervonenkis dimension is needed? Lower bound on learnability... j=1

8 Lower bound for learning suppose Vapnik Chervonenkis dimension dim VC H = D construct a probability distribution P on A with respect to which learner needs to draw at least m D 4 log 2( 3 2 ) + log 2( 1 8δ ) in order to have P(d(ĥ m, h) > ɛ) < δ so in particular if D = don t have learnability (for this P)

9 construction of P since dim VC H = D have a set x 1,..., x D that is shattered by H: assign measure P(x i ) = 1 D to these points assign measure zero to all other points in A in this measure two functions h 1, h 2 H have distance d P (h 1, h 2 ) = 0 iff they agree on all points x i mod out H by equivalence relation h 1 h 2 if h 1 (x i ) = h 2 (x i ) for all 1 i D: set of equivalence classes H / has 2 D points

10 Partitioning of H draw a sequence z = (z 1,..., z m ) of random data according to the probability distribution P suppose z contains l distinct elements among the X = {x 1,..., x D } (the remaining D l do not occur in z) there are then 2 l possible ways in which can label z = z h by a potential candidate target function h H / these choices determine a partitioning of H into disjoint subsets H = H 1 H 2 l each H i in this partition contains exactly 2 D l different functions that agree on the l distinct elements in z

11 Estimate of sum d(a(z h ), h) = h H 2 l i=1 h H i d(a(z h ), h) the 2 D l functions in H i all agree on data set z h while on remaining D l elements of X the functions h and A(z h ) in H i disagree somewhere if A(z h ) and h disagree in j places then d(a(z h ), h) j/d and this can happen in ( ) D l j possible ways: D l ( ) D l j d(a(z h ), h) j D 2D l (D l) 2D h H i j=0 h H d(a(z h), h) 2D (D l) 2D

12 Candidate target function set S l = {z z has l distinct elements} P(z) 1 2 D z S l h H d(a(z), h) D l 2D P(S l) change order of sum: 2 D h z P(z)d(A(z), h) to have inequality there must be at least one h = h with z S l P(z)d(A(z), h ) D l 2D P(S l) this h = h is a candidate target function with a certain estimate of inaccuracy of learning hypothesis

13 Inaccuracy estimate Set of draws of m data on which learner s hypothesis A(z) differs from candidate target h by more than a given size β: S β = {z S l d(a(z), h ) > β} lower bound on P(S β ): D l 2D P(S l) z S β P(z)d(A(z), h ) + P(S β ) + β(p(s l ) P(S β )) gives P(S β ) (1 β)p(s β ) ( D l 2D β)p(s l) z S l S β P(z)d(A(z), h )

14 arrange for P(S ɛ ) > δ if target h then with probability P(S ɛ ) learner hypothesis more than ɛ away from target take arbitrary l to be l = D/2 and ɛ < 1/8, then ( D l 2D β)p(s l) > 1 8 P(S l) if have P(S D/2 ) > 8δ get also P(S ɛ ) > δ so can arrange that probability of learner hypothesis differing from target more than ɛ is greater than δ find conditions for P(S D/2 ) > 8δ

15 arrange for P(S D/2 ) > 8δ P(S l ) = probability of drawing l distinct elements of X = {x 1,..., x D } in m identically distributed trials ( D l ) ways of choosing l elements; for each choice l! ways in which items can appear in first l positions S (i) l S l set of all z = (z 1,..., z m ) with i-th choice of placing the l distinct elements in first l positions (remaining m l positions: same l elements disposed in any way) P(S (i) l ) = ( 1 D )l ( l D )m l the S (i) l disjoint so P(S l ) ( ) D l! P(S (i) l l ) = ( ) D l! ( 1 l D )l ( l D )m l

16 for D = 2l have ( D l ) l! ( 1 D )l ( l D )m l = (2l)! l! l l 2 m also have (2l)! l l! l l = (1 + j l ) ( )l/2 j=1 then have P(S D/2 ) > 8δ for P(S D/2 ) 2 m ( 3 2 )D/4 m < D 4 log 2( 3 2 ) + log 2( 1 8δ ) conclusion: in constructed probability P learner needs at least m larger than above to achieve P(d(A(z), h ) > ɛ) < δ

17 Unlearnability problem remains! The set of all finite languages is unlearnable The set of all regular languages is unlearnable The set of all context-free languages is unlearnable impose further constraints on learning limit the size of grammars... constraint on the number of production rules

18 Example H n,k = class of Regular Grammars recognized by deterministic finite state automata with at most n states with fix number of letters #A = k size of this family #H n,k ( ) n n k then Vapnik Chervonenkis dimension ( ) n n dim VC H n,k log 2 ( ) nk log k 2 (n)

Language Acquisition and Parameters: Part II

Language Acquisition and Parameters: Part II Language Acquisition and Parameters: Part II Matilde Marcolli CS0: Mathematical and Computational Linguistics Winter 205 Transition Matrices in the Markov Chain Model absorbing states correspond to local

More information

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about

More information

Models of Language Evolution: Part II

Models of Language Evolution: Part II Models of Language Evolution: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Main Reference Partha Niyogi, The computational nature of language learning and evolution,

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE

More information

The Power of Random Counterexamples

The Power of Random Counterexamples Proceedings of Machine Learning Research 76:1 14, 2017 Algorithmic Learning Theory 2017 The Power of Random Counterexamples Dana Angluin DANA.ANGLUIN@YALE.EDU and Tyler Dohrn TYLER.DOHRN@YALE.EDU Department

More information

Language Learning Problems in the Principles and Parameters Framework

Language Learning Problems in the Principles and Parameters Framework Language Learning Problems in the Principles and Parameters Framework Partha Niyogi Presented by Chunchuan Lyu March 22, 2016 Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 1 /

More information

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization : Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage

More information

Models of Language Evolution

Models of Language Evolution Models of Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Main Reference Partha Niyogi, The computational nature of language learning and evolution, MIT Press, 2006. From

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory

More information

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

Computational Learning Theory

Computational Learning Theory 1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number

More information

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction

More information

Statistical Learning Learning From Examples

Statistical Learning Learning From Examples Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY

THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY Dan A. Simovici UMB, Doctoral Summer School Iasi, Romania What is Machine Learning? The Vapnik-Chervonenkis Dimension Probabilistic Learning Potential

More information

MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension

MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB Prof. Dan A. Simovici (UMB) MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension 1 / 30 The

More information

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

Computational Learning Theory. Definitions

Computational Learning Theory. Definitions Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational

More information

Computational Learning Theory

Computational Learning Theory 09s1: COMP9417 Machine Learning and Data Mining Computational Learning Theory May 20, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son

More information

On the Sample Complexity of Noise-Tolerant Learning

On the Sample Complexity of Noise-Tolerant Learning On the Sample Complexity of Noise-Tolerant Learning Javed A. Aslam Department of Computer Science Dartmouth College Hanover, NH 03755 Scott E. Decatur Laboratory for Computer Science Massachusetts Institute

More information

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa CS:4330 Theory of Computation Spring 2018 Regular Languages Finite Automata and Regular Expressions Haniel Barbosa Readings for this lecture Chapter 1 of [Sipser 1996], 3rd edition. Sections 1.1 and 1.3.

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

CS 125 Section #10 (Un)decidability and Probability November 1, 2016

CS 125 Section #10 (Un)decidability and Probability November 1, 2016 CS 125 Section #10 (Un)decidability and Probability November 1, 2016 1 Countability Recall that a set S is countable (either finite or countably infinite) if and only if there exists a surjective mapping

More information

The Pumping Lemma. for all n 0, u 1 v n u 2 L (i.e. u 1 u 2 L, u 1 vu 2 L [but we knew that anyway], u 1 vvu 2 L, u 1 vvvu 2 L, etc.

The Pumping Lemma. for all n 0, u 1 v n u 2 L (i.e. u 1 u 2 L, u 1 vu 2 L [but we knew that anyway], u 1 vvu 2 L, u 1 vvvu 2 L, etc. The Pumping Lemma For every regular language L, there is a number l 1 satisfying the pumping lemma property: All w L with w l can be expressed as a concatenation of three strings, w = u 1 vu 2, where u

More information

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate

More information

Formal Languages 2: Group Theory

Formal Languages 2: Group Theory 2: Group Theory Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Group G, with presentation G = X R (finitely presented) X (finite) set of generators x 1,..., x N R (finite)

More information

Computational Learning Theory

Computational Learning Theory 0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

Dan Roth 461C, 3401 Walnut

Dan Roth  461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU10701 11. Learning Theory Barnabás Póczos Learning Theory We have explored many ways of learning from data But How good is our classifier, really? How much data do we

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

CSE 311: Foundations of Computing. Lecture 23: Finite State Machine Minimization & NFAs

CSE 311: Foundations of Computing. Lecture 23: Finite State Machine Minimization & NFAs CSE : Foundations of Computing Lecture : Finite State Machine Minimization & NFAs State Minimization Many different FSMs (DFAs) for the same problem Take a given FSM and try to reduce its state set by

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

CS 6375: Machine Learning Computational Learning Theory

CS 6375: Machine Learning Computational Learning Theory CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Computational and Statistical Learning theory

Computational and Statistical Learning theory Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,

More information

Introduction to Automata

Introduction to Automata Introduction to Automata Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

CS243, Logic and Computation Nondeterministic finite automata

CS243, Logic and Computation Nondeterministic finite automata CS243, Prof. Alvarez NONDETERMINISTIC FINITE AUTOMATA (NFA) Prof. Sergio A. Alvarez http://www.cs.bc.edu/ alvarez/ Maloney Hall, room 569 alvarez@cs.bc.edu Computer Science Department voice: (67) 552-4333

More information

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14 Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our

More information

The PAC Learning Framework -II

The PAC Learning Framework -II The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline

More information

SCHEME FOR INTERNAL ASSESSMENT TEST 3

SCHEME FOR INTERNAL ASSESSMENT TEST 3 SCHEME FOR INTERNAL ASSESSMENT TEST 3 Max Marks: 40 Subject& Code: Automata Theory & Computability (15CS54) Sem: V ISE (A & B) Note: Answer any FIVE full questions, choosing one full question from each

More information

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38 Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and

More information

Hence C has VC-dimension d iff. π C (d) = 2 d. (4) C has VC-density l if there exists K R such that, for all

Hence C has VC-dimension d iff. π C (d) = 2 d. (4) C has VC-density l if there exists K R such that, for all 1. Computational Learning Abstract. I discuss the basics of computational learning theory, including concept classes, VC-dimension, and VC-density. Next, I touch on the VC-Theorem, ɛ-nets, and the (p,

More information

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016 Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm

More information

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA) Deterministic Finite Automata Non deterministic finite automata Automata we ve been dealing with have been deterministic For every state and every alphabet symbol there is exactly one move that the machine

More information

PAC LEARNING, VC DIMENSION, AND THE ARITHMETIC HIERARCHY

PAC LEARNING, VC DIMENSION, AND THE ARITHMETIC HIERARCHY PAC LEARNING, VC DIMENSION, AND THE ARITHMETIC HIERARCHY WESLEY CALVERT Abstract. We compute that the index set of PAC-learnable concept classes is m-complete Σ 0 3 within the set of indices for all concept

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism, CS 54, Lecture 2: Finite Automata, Closure Properties Nondeterminism, Why so Many Models? Streaming Algorithms 0 42 Deterministic Finite Automata Anatomy of Deterministic Finite Automata transition: for

More information

A Bound on the Label Complexity of Agnostic Active Learning

A Bound on the Label Complexity of Agnostic Active Learning A Bound on the Label Complexity of Agnostic Active Learning Steve Hanneke March 2007 CMU-ML-07-103 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Machine Learning Department,

More information

CS Communication Complexity: Applications and New Directions

CS Communication Complexity: Applications and New Directions CS 2429 - Communication Complexity: Applications and New Directions Lecturer: Toniann Pitassi 1 Introduction In this course we will define the basic two-party model of communication, as introduced in the

More information

Foundations of Informatics: a Bridging Course

Foundations of Informatics: a Bridging Course Foundations of Informatics: a Bridging Course Week 3: Formal Languages and Semantics Thomas Noll Lehrstuhl für Informatik 2 RWTH Aachen University noll@cs.rwth-aachen.de http://www.b-it-center.de/wob/en/view/class211_id948.html

More information

Sample width for multi-category classifiers

Sample width for multi-category classifiers R u t c o r Research R e p o r t Sample width for multi-category classifiers Martin Anthony a Joel Ratsaby b RRR 29-2012, November 2012 RUTCOR Rutgers Center for Operations Research Rutgers University

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Hypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell

Hypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell Hypothesis Testing and Computational Learning Theory EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell Overview Hypothesis Testing: How do we know our learners are good? What does performance

More information

Parallel Learning of Automatic Classes of Languages

Parallel Learning of Automatic Classes of Languages Parallel Learning of Automatic Classes of Languages Sanjay Jain a,1, Efim Kinber b a School of Computing, National University of Singapore, Singapore 117417. b Department of Computer Science, Sacred Heart

More information

Learning Unary Automata

Learning Unary Automata Learning Unary Automata Presented at the Seventh Workshop on Descriptional Complexity of Formal Systems DCFS, Como, Italy, June 30 - July 2, 2005 Gregor Gramlich gramlich@thi.informatik.uni-frankfurt.de

More information

CSC236 Week 11. Larry Zhang

CSC236 Week 11. Larry Zhang CSC236 Week 11 Larry Zhang 1 Announcements Next week s lecture: Final exam review This week s tutorial: Exercises with DFAs PS9 will be out later this week s. 2 Recap Last week we learned about Deterministic

More information

Computational Learning Theory (COLT)

Computational Learning Theory (COLT) Computational Learning Theory (COLT) Goals: Theoretical characterization of 1 Difficulty of machine learning problems Under what conditions is learning possible and impossible? 2 Capabilities of machine

More information

Computational Learning Theory (VC Dimension)

Computational Learning Theory (VC Dimension) Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is

More information

Part of the slides are adapted from Ziko Kolter

Part of the slides are adapted from Ziko Kolter Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines

More information

Information, Learning and Falsification

Information, Learning and Falsification Information, Learning and Falsification David Balduzzi December 17, 2011 Max Planck Institute for Intelligent Systems Tübingen, Germany Three main theories of information: Algorithmic information. Description.

More information

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY 5-453 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY NON-DETERMINISM and REGULAR OPERATIONS THURSDAY JAN 6 UNION THEOREM The union of two regular languages is also a regular language Regular Languages Are

More information

Math 710 Homework 1. Austin Mohr September 2, 2010

Math 710 Homework 1. Austin Mohr September 2, 2010 Math 710 Homework 1 Austin Mohr September 2, 2010 1 For the following random experiments, describe the sample space Ω For each experiment, describe also two subsets (events) that might be of interest,

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

Generalization and Overfitting

Generalization and Overfitting Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle

More information

Generalization theory

Generalization theory Generalization theory Chapter 4 T.P. Runarsson (tpr@hi.is) and S. Sigurdsson (sven@hi.is) Introduction Suppose you are given the empirical observations, (x 1, y 1 ),..., (x l, y l ) (X Y) l. Consider the

More information

CSE 105 Homework 5 Due: Monday November 13, Instructions. should be on each page of the submission.

CSE 105 Homework 5 Due: Monday November 13, Instructions. should be on each page of the submission. CSE 05 Homework 5 Due: Monday November 3, 207 Instructions Upload a single file to Gradescope for each group. should be on each page of the submission. All group members names and PIDs Your assignments

More information

On Rice s theorem. Hans Hüttel. October 2001

On Rice s theorem. Hans Hüttel. October 2001 On Rice s theorem Hans Hüttel October 2001 We have seen that there exist languages that are Turing-acceptable but not Turing-decidable. An important example of such a language was the language of the Halting

More information

The Vapnik-Chervonenkis Dimension

The Vapnik-Chervonenkis Dimension The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB 1 / 91 Outline 1 Growth Functions 2 Basic Definitions for Vapnik-Chervonenkis Dimension 3 The Sauer-Shelah Theorem 4 The Link between VCD and

More information

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science

More information

Lecture 13: Introduction to Neural Networks

Lecture 13: Introduction to Neural Networks Lecture 13: Introduction to Neural Networks Instructor: Aditya Bhaskara Scribe: Dietrich Geisler CS 5966/6966: Theory of Machine Learning March 8 th, 2017 Abstract This is a short, two-line summary of

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

Introduction to Theoretical Computer Science. Motivation. Automata = abstract computing devices

Introduction to Theoretical Computer Science. Motivation. Automata = abstract computing devices Introduction to Theoretical Computer Science Motivation Automata = abstract computing devices Turing studied Turing Machines (= computers) before there were any real computers We will also look at simpler

More information

Turing Machines A Turing Machine is a 7-tuple, (Q, Σ, Γ, δ, q0, qaccept, qreject), where Q, Σ, Γ are all finite

Turing Machines A Turing Machine is a 7-tuple, (Q, Σ, Γ, δ, q0, qaccept, qreject), where Q, Σ, Γ are all finite The Church-Turing Thesis CS60001: Foundations of Computing Science Professor, Dept. of Computer Sc. & Engg., Turing Machines A Turing Machine is a 7-tuple, (Q, Σ, Γ, δ, q 0, q accept, q reject ), where

More information

Web-Mining Agents Computational Learning Theory

Web-Mining Agents Computational Learning Theory Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)

More information

1 A Lower Bound on Sample Complexity

1 A Lower Bound on Sample Complexity COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #7 Scribe: Chee Wei Tan February 25, 2008 1 A Lower Bound on Sample Complexity In the last lecture, we stopped at the lower bound on

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four

More information

On P Systems with Active Membranes

On P Systems with Active Membranes On P Systems with Active Membranes Andrei Păun Department of Computer Science, University of Western Ontario London, Ontario, Canada N6A 5B7 E-mail: apaun@csd.uwo.ca Abstract. The paper deals with the

More information

CS 154. Finite Automata, Nondeterminism, Regular Expressions

CS 154. Finite Automata, Nondeterminism, Regular Expressions CS 54 Finite Automata, Nondeterminism, Regular Expressions Read string left to right The DFA accepts a string if the process ends in a double circle A DFA is a 5-tuple M = (Q, Σ, δ, q, F) Q is the set

More information

Narrowing confidence interval width of PAC learning risk function by algorithmic inference

Narrowing confidence interval width of PAC learning risk function by algorithmic inference Narrowing confidence interval width of PAC learning risk function by algorithmic inference Bruno Apolloni, Dario Malchiodi Dip. di Scienze dell Informazione, Università degli Studi di Milano Via Comelico

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad-500 014 Subject: FORMAL LANGUAGES AND AUTOMATA THEORY Class : CSE II PART A (SHORT ANSWER QUESTIONS) UNIT- I 1 Explain transition diagram, transition

More information

Computational Learning Theory for Artificial Neural Networks

Computational Learning Theory for Artificial Neural Networks Computational Learning Theory for Artificial Neural Networks Martin Anthony and Norman Biggs Department of Statistical and Mathematical Sciences, London School of Economics and Political Science, Houghton

More information

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS Automata Theory Lecture on Discussion Course of CS2 This Lecture is about Mathematical Models of Computation. Why Should I Care? - Ways of thinking. - Theory can drive practice. - Don t be an Instrumentalist.

More information

12.1 A Polynomial Bound on the Sample Size m for PAC Learning

12.1 A Polynomial Bound on the Sample Size m for PAC Learning 67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 12: PAC III Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 In this lecture will use the measure of VC dimension, which is a combinatorial

More information