On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center

Similar documents
Evolvability from Learning Algorithms

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning

The Perceptron algorithm

Agnostic Learning of Disjunctions on Symmetric Distributions

Computational Learning Theory

Online Learning, Mistake Bounds, Perceptron Algorithm

Computational Learning Theory. CS534 - Machine Learning

Linear Classifiers: Expressiveness

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Evolution with Drifting Targets

Machine Learning

Order- Revealing Encryp2on and the Hardness of Private Learning

ICML '97 and AAAI '97 Tutorials

CS446: Machine Learning Spring Problem Set 4

Machine Learning

Lecture 29: Computational Learning Theory

Constructing Hard Functions from Learning Algorithms

Discriminative Models

Lecture 7: Passive Learning

Lecture 8. Instructor: Haipeng Luo

The Perceptron Algorithm

Computational Learning Theory

Learning and Fourier Analysis

Testing by Implicit Learning

Learning large-margin halfspaces with more malicious noise

Distribution Free Learning with Local Queries

Computational Learning Theory. Definitions

Computational and Statistical Learning Theory

Name (NetID): (1 Point)

What Can We Learn Privately?

Some Selected Topics. Spectral Norm in Learning Theory:

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Polynomial time Prediction Strategy with almost Optimal Mistake Probability

Discriminative Models

Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron

Name (NetID): (1 Point)

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

Introduction to Machine Learning (67577) Lecture 3

Statistical Active Learning Algorithms

On the Sample Complexity of Noise-Tolerant Learning

Learning DNF Expressions from Fourier Spectrum

Computational Lower Bounds for Statistical Estimation Problems

1 Differential Privacy and Statistical Query Learning

Computational learning theory. PAC learning. VC dimension.

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

On the tradeoff between computational complexity and sample complexity in learning

Qualifying Exam in Machine Learning

On Noise-Tolerant Learning of Sparse Parities and Related Problems

Computational Learning Theory

Statistical Query Learning (1993; Kearns)

Learning Sparse Perceptrons

Computational Learning Theory

1 Last time and today

On the Efficiency of Noise-Tolerant PAC Algorithms Derived from Statistical Queries

Computational and Statistical Learning Theory

On the Power of Learning from k-wise Queries

Learning theory. Ensemble methods. Boosting. Boosting: history

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Machine Learning (CS 567) Lecture 3

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Part of the slides are adapted from Ziko Kolter

A Theoretical and Empirical Study of a Noise-Tolerant Algorithm to Learn Geometric Patterns

Computational Learning Theory

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

Lecture 4: LMN Learning (Part 2)

Discriminative Learning can Succeed where Generative Learning Fails

18.9 SUPPORT VECTOR MACHINES

10.1 The Formal Model

New Results for Random Walk Learning

Computational Learning Theory (COLT)

Final Exam, Fall 2002

Can PAC Learning Algorithms Tolerate. Random Attribute Noise? Sally A. Goldman. Department of Computer Science. Washington University

PAC Model and Generalization Bounds

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

Multiclass Boosting with Repartitioning

CS 395T Computational Learning Theory. Scribe: Rahul Suri

3 Finish learning monotone Boolean functions

Efficient Bandit Algorithms for Online Multiclass Prediction

Learning Theory. Machine Learning B Seyoung Kim. Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks!

Separating Models of Learning with Faulty Teachers

CS 6375: Machine Learning Computational Learning Theory

Multiclass Classification-1

Computational Learning Theory (VC Dimension)

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework

Lecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time

Learning from Examples

Machine Learning in the Data Revolution Era

COMS 4771 Lecture Boosting 1 / 16

1 Rademacher Complexity Bounds

Dan Roth 461C, 3401 Walnut

Cutting Plane Training of Structural SVM

MIRA, SVM, k-nn. Lirong Xia

Evaluation requires to define performance measures to be optimized

Distributed Machine Learning. Maria-Florina Balcan, Georgia Tech

Midterm: CS 6375 Spring 2015 Solutions

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Transcription:

On the power and the limits of evolvability Vitaly Feldman Almaden Research Center

Learning from examples vs evolvability Learnable from examples Evolvable Parity functions

The core model: PAC [Valiant 84] Learner observes random examples: (x, f(x)) Assumption: unknown Boolean function f: X { 1,1} labeling the examples comes from a known class of functions C every distribution D Distribution D over X (e.g. R n or 1,1 n ) For every f C, ε > 0, w.h.p. output h: X [ 1,1] s.t. E f x h x 1 ε x D For Boolean h f x = h x 1 ε/2 Pr x D Efficient: poly( 1 ε, x ) time

Classical example C halfspaces/linear threshold functions o sign( i w i x i θ) for w 1, w 2,, w n, θ R o equivalent to examples being linearly separable Perceptron algorithm [Rosenblatt 57; Block 62; Novikoff 62] o Start with LTF h 0 defined by w 0 = 0,0,, 0 ; θ 0 = 0 o Get a random example (x, l). If h t x = l do nothing o Else let h t1 be LTF defined by w t1 = w t l x ; θ t1 = θ t l Gives PAC learning if f has significant margin γ on the observed data points

Evolution algorithm R representation class of functions over X o E.g. all linear thresholds over R n M randomized mutation algorithm that given r R outputs (a random mutation) r R o Efficient: poly in 1 ε, n o E.g. choose a random i and adjust w i by 0, 1 or 1 randomly Mutation algorithm Hypothesis

Selection Fitness Perf D (f, r) [ 1,1] o Correlation: E D f x r x For r R sample M(r) p times: r 1, r 2,, r p Estimate empirical performance of r and each r i using s samples: Perf D f, r i Natural selection Perf D f, r p and s are feasible (polynomial in n, 1/ε) 1 1 t t Empirical performance

Evolvability Class of functions C is evolvable over D if exists an evolution algorithm (R, M) and a polynomial g(, ) s.t. For every f C,r R, >0 and a sequence r 0 = r, r 1, r 2, where r i1 Select(R, M, r i ) it holds: Perf D (f,r g n, 1 ε ) 1 w.h.p. Evolvable (distributionindependently) Evolvable for all D by the same mutation algorithm (R, M)

Limits of evolvability Feedback is restricted to values of Perf D f, r i for some polynomial number of samples s h 1 v 1 h 2 v 2 h q learning algorithm v q Perf D (s) oracle v i = Perf D f, h i evaluated on s fresh examples

Evolvable CSQ learnable PAC learnable Learnable with Perf oracle = CSQ learnable Evolvable Correlational Statistical Query: To query h CSQ oracle responds with any value v 1 v E D f x h(x) τ for τ poly n, 1 ε Learning by Distances [BenDavid,Itai,Kushilevitz 90] Restriction of SQ model by Kearns [93]

CSQ learnable Evolvable [F. 08] PAC learnable Learnable with Perf oracle = CSQ learnable Evolvable

Proof outline Replace queries for performance values with approximate comparisons For hypothesis h: X 1,1, tolerance τ > 0 and threshold t τ, CSQ > oracle returns: 1 if E D f x h x t τ 0 if E D f x h x t τ 0 or 1 otherwise Design evolution algorithm with mutations that simulate comparison queries

From comparisons to mutations For hypothesis h: X 1,1, tolerance τ > 0 and threshold t τ, CSQ > oracle returns: 1 if E D f x h x t τ 0 if E D f x h x t τ 0 or 1 otherwise r 0 1 δ δ r 0 0 r 1 h Beneficial/neutral threshold = t Mutation pool size p = O log (1 δ ) δ Performance sample size s = O log (1 δ ) τ 2 With probability at least 1 δ, Select r = r b where b is valid response to comparison query

Simulating CSQ > algorithm Need to answer q queries φ i is the function obtained after answering i queries Need to answer query h i with threshold t i r φ i 1 δ δ r 0 φ i r 1 φ i h i Beneficial/neutral threshold = t i Mutation sample size p = O log (1 δ ) δ Performance sample size s = O log (1 δ ) τ 2 Leads to representations with values in [ q, q]! Rescale by 1/q to get functions with values in [1,1] Given answers to queries can compute h such that Perf D f, h 1 ε and mutate into it

CSQ learnable Evolvable [F.08; F. 09] PAC learnable CSQ learnable Evolvable E.g. optimizing selection; recombination [Kanade 11]; changing thresholds; number of mutations

How powerful is CSQ learning? SQ learning [Kearns 93] : learner submits ψ(x, l) SQ oracle returns v such that v E D ψ x, f(x) τ Many known algorithms are essentially described in SQ or can be easily translated to SQ o Boolean conjunctions, decision lists, simple geometric concepts, AC 0 [Kearns 93] Several other were found with more effort o Halfspaces with large margin [Bylander 94] o General LTFs [BlumFrie.Kann.Vemp. 96; Duna.Vemp. 04] General ML techniques o Nearest neighbor o Gradient descent o SVM o Boosting

Perceptron in SQ Recall the Perceptron algorithm: o Add false negatives, subtract false positive examples False positive Use SQs to find the centroid of false positives E D [x 1 I f x = 1 I(h t (x) = 1)] gives the first coordinate of the centroid Use the centroid for the Perceptron update h t f

If D is fixed then SQ CSQ Decompose SQ into CSQ and a constant CSQ Corollary: linear threshold functions are evolvable for any fixed distribution D

Distributionindependent CSQ Single points are learnable [F. 09] Characterization of weaklearning [F. 08] Better than random guessing: E x D f x h x 1 poly n, 1 ε C is weakly CSQ learnable if and only if all functions in C can be represented as linear threshold functions with significant margin over a polysize set of Boolean features General linear thresholds are not weakly CSQ learnable [Goldmann,Hastad,Razborov 95] (but are SQ learnable) Conjunctions are not CSQ learnable [F. 11]

Further directions Characterize (strong) evolvability (CSQ learning) o Strengthen the lower bound for conjunctions Are thresholds on a line evolvable distribution independently Perf D f, r = E D f x r x 2 then all of SQ is evolvable [F. 09]