Active Testing! Liu Yang! Joint work with Nina Balcan,! Eric Blais and Avrim Blum! Liu Yang 2011

Similar documents
Learning and Fourier Analysis

Learning and Fourier Analysis

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model

Lecture 7: Passive Learning

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015

Mathematical Theories of Interaction with Oracles

Lecture 29: Computational Learning Theory

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

1 Last time and today

An Algorithms-based Intro to Machine Learning

Junta Approximations for Submodular, XOS and Self-Bounding Functions

Two Decades of Property Testing

Lecture 9: March 26, 2014

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

ICML '97 and AAAI '97 Tutorials

Settling the Query Complexity of Non-Adaptive Junta Testing

Nearly Tight Bounds for Testing Function Isomorphism

3 Finish learning monotone Boolean functions

Distributed Machine Learning. Maria-Florina Balcan Carnegie Mellon University

Space Complexity vs. Query Complexity

Introduction to Computational Learning Theory

With P. Berman and S. Raskhodnikova (STOC 14+). Grigory Yaroslavtsev Warren Center for Network and Data Sciences

Machine Learning

Active Tolerant Testing

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan

Polynomial regression under arbitrary product distributions

Lecture 10: Learning DNF, AC 0, Juntas. 1 Learning DNF in Almost Polynomial Time

Distributed Machine Learning. Maria-Florina Balcan, Georgia Tech

Proclaiming Dictators and Juntas or Testing Boolean Formulae

Lecture 5: February 16, 2012

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University

Active Learning: Disagreement Coefficient

Active Learning Class 22, 03 May Claire Monteleoni MIT CSAIL

Stephen Scott.

3.1 Decision Trees Are Less Expressive Than DNFs

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015

8 Further theory of function limits and continuity

6.842 Randomness and Computation April 2, Lecture 14

Active Learning from Single and Multiple Annotators

Mixture of Gaussians Models

Learning Theory. Machine Learning B Seyoung Kim. Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks!

Support Vector Machines (SVMs).

Linear sketching for Functions over Boolean Hypercube

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Testing Halfspaces. Ryan O Donnell Carnegie Mellon University. Rocco A. Servedio Columbia University.

2 Upper-bound of Generalization Error of AdaBoost

CS 151 Complexity Theory Spring Solution Set 5

Linear Models for Regression. Sargur Srihari

Approximating Submodular Functions. Nick Harvey University of British Columbia

The Analysis of Partially Symmetric Functions

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection

An Optimal Lower Bound for Monotonicity Testing over Hypergrids

Machine Learning

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Learning Theory: Basic Guarantees

Testing by Implicit Learning

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

A Bound on the Label Complexity of Agnostic Active Learning

Polynomial time Prediction Strategy with almost Optimal Mistake Probability

Support Vector Machines and Kernel Methods

Testing Properties of Boolean Functions

Recap from end of last time

From Batch to Transductive Online Learning

Polynomial regression under arbitrary product distributions

Computational Learning Theory

Testing Problems with Sub-Learning Sample Complexity

Random projection trees and low dimensional manifolds. Sanjoy Dasgupta and Yoav Freund University of California, San Diego

On Finding Planted Cliques in Random Graphs

Learning Theory, Overfi1ng, Bias Variance Decomposi9on

Practice Final Solutions. 1. Consider the following algorithm. Assume that n 1. line code 1 alg(n) { 2 j = 0 3 if (n = 0) { 4 return j

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 1: Quantum circuits and the abelian QFT

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Efficient Semi-supervised and Active Learning of Disjunctions

Learning Combinatorial Functions from Pairwise Comparisons

Combinatorial Aspects of. Error Correction and Property Testing

Quantum boolean functions

Does Unlabeled Data Help?

PROPERTY TESTING LOWER BOUNDS VIA COMMUNICATION COMPLEXITY

Joint distribution optimal transportation for domain adaptation

EM (cont.) November 26 th, Carlos Guestrin 1

Notes for Lecture 15

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center

5.1 Learning using Polynomial Threshold Functions

Testing Cluster Structure of Graphs. Artur Czumaj

Linear & nonlinear classifiers

Lecture 21: Counting and Sampling Problems

Previously Monte Carlo Integration

Solutions of Equations in One Variable. Newton s Method

Lower Bound Techniques for Multiparty Communication Complexity

Application: Bucket Sort

Midterm: CS 6375 Spring 2015 Solutions

Empirical Evaluation (Ch 5)

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLĂ„RNING

On Finding Planted Cliques and Solving Random Linear Equa9ons. Math Colloquium. Lev Reyzin UIC

Learning convex bodies is hard

Real Time Pattern Detection

Transcription:

! Active Testing! Liu Yang! Joint work with Nina Balcan,! Eric Blais and Avrim Blum! 1

Property Testing Instance space X = R n (Distri D over X)! Tested function f : X->{0,1}! A property P of Boolean fn is a subset of all Boolean fns h : X -> {-1,1} (e.g ltf)! dist D (f, P):=min g P P x~d [f(x) g(x)]! Standard Type of query: membership query! 2

Property Testing If f P should accept w/ prob 2/3! If dist(f,p)>ε should reject w/ prob 2/3 E.g. Union of d Intervals 0 ++++ +++++++++ ++ +++ 1 - UINT 4? Yes! UINT 3? Depend on! - Model selection: testing can tell us how big d need to be close to target! (double and guess, d = 2, 4, 8, 16,.)! 3

Property Testing and Learning : Motivation! What is Property Testing for?! - Quickly tell if the right fn class to use! - Estimate complexity of fn without! actually learning! Want to do it with fewer! queries than learning! 4

Standard Model uses Membership Query Results of Tes9ng basic Boolean fns using MQ: Constant QC for UINTd, dictator, lg, However 5

Membership Query is Unrealistic for Machine Learning Problems! Recognizing cat/dog? MQ gives!! Is this a dog! or a cat?! 6

Membership Query is Unrealistic for Machine Learning Problems! 7

Passive : Waste Too Many Queries ML people move on Passive Model (sample from D)! query samples exist in NATURE! ; but quite wasteful (many examples uninformative)! Can we SAVE #queries?! 8

! Active Testing! The NEW! Model of! Property Testing! 9

Property Tester! Definition. An s-sample, q-query -tester for P over the distribution D is a randomized algorithm A that draws s samples from D, sequentially queries for the value of f on q of those samples, and then! 1. Accepts w.p. at least 2/3 when f P! 2. Rejects w.p. at least 2/3 when dist D (f,p)>! expensive cheap 10

Active Tester Definition. A randomized algorithm is a q- query active ε-tester for P R n ->{0, 1} of over D if it is a poly(n)-sample, q-query ε-tester for P over D.! 11

!!!! Active Property Testing! Testing as preprocessing step of learning! Need an example? where Active testing! - get same QC saving as MQ! - better in QC than Passive! - need fewer queries than Learning! Union of d Intervals, active testing help! 0 ++++ +++++++++ ++ +++ 1 - Testing tells how big d need to be close to target! - #Label: Active Testing need O(1), Passive Testing! need ( d), Active Learning need (d)! 12

Outline! Our Results of Various Classes! Testing Disjoint Unions of Testable Properties! General Testing Dimension! 13

!!!!! Our Results NEW!! Ac2ve Tes2ng Passive Tes2ng Ac2ve Learning Union of d Intervals O(1) Θ( d) Θ(d) Union of d Thresh O(1) Θ( d) const Dictator Θ(log n) Θ(log n) Θ(log n) Linear Threshold Fn O( n) ~ Θ( n) Θ(n) MQ-like on testing UINTd! Cluster Assump9on O(1) Ω( N) Θ(N) Passive-like on testing Dictator!! MQ-like Passive-like 14

Testing Unions of Intervals! 0 ++++ +++++++++ ++ +++ 1! Theorem. Testing UINTd in the active testing model can be done using O(1/ε 3 ) queries. If uniform distribution, we need only O( d/ε 5 ) unlabeled egs.! Proof Idea:! - Noise Sensitivity:=Pr[two close pts label diff]! - all UINTd have low NS whereas all fns far from this class have noticeably larger NS! - a tester that est noise sensitivity of input fn.! 15

Testing Unions of Intervals (cont.) easy Definition: Fix >0. The local -noise sensitivity of fn f: [0, 1]->{0, 1} at x [0; 1] is. The noise sensitivity of f is! Proposition: Fix >0. Let f: [0, 1] -> {0,1} be a union of d intervals. NS (f) d.! Lemma: Fix = 2 /(32d). Let f : [0, 1] -> {0, 1} be a fn with noise sensitivity bounded by NS (f) d (1 + /4 ). Then f is -close to a union of d intervals.! 16 hard

z!!!!!! at δ Nr!!! δ δ δ ++++ +++++++++ ++ +++ ++ +++ δ δ 17 δ

Testing Unions of Intervals (cont.) Theorem. Testing UINTd in the active testing model can be done using O(1/ε 3 ) queries. If uniform distribution, we need only O( d/ε 5 ) unlabeled egs.! 18

Testing Linear Threshold Fns! Theorem. We can efficiently test LTFs under the Gaussian distribution with Õ( n) labeled examples in both active and passive testing models. We have lower bounds of ~ Ω (n 1/3 ) for active testing and ~ Ω ( n) on #labels needed for passive testing.! Learn ltf need Omega(n) under Gaussian! So testing is better than learning in this case. 19

Testing Linear Threshold Fns (cont.) Definition: Hermite polynomials : h 0 (x) = 1, h 1 (x) = x; h 2 (x) = 1/ 2(x 2-1),,! - complete orthogonal basis under <f,g> = E x [f(x)g(x)], where E x over std Gaussian distrib! For any S in N n, define! - Hermite coefficient of f: Rn -> R corresponding to S is! - Hermite decomposition of f:! The degree of the coefficient! 20

Testing Linear Threshold Fns (cont.) Lemma: There is an explicit continuous fn! W: R->R w/bounded derivative! and peak value s.t. every ltf! f : Rn -> {1, -1} satisfies! Also, every fn g : R n -> {-1, 1} that satisfies! 4 ; is -close to be ltf.!! Lemma: For any fn f : R n -> R, we have!,, <x, y> = is the standard vector dot product.! 21

Testing Linear Threshold Fns (cont.)! The U-statistic (of order 2) with symmetric! kernel function g : R n R n -> R is! z 22

Outline! Our Results of Various Classes! Testing Disjoint Unions of Testable Properties! General Testing Dimension! 23

! Testing Disjoint Unions of Testable Properties! Combine a collection of properties P i via their disjoint union.! Theorem. Given properties P1,,PN, if each Pi is testable over Di w/ q( ) queries & U( ) unlabeled samples, then their disjoint union P is testable over the combined distribution D with O(q( /2) (log 3 1/ )) queries and O(U( /2) (N/ log 3 1/ )) unlabeled samples.! 24

Outline! Our Results of Various Classes! Testing Disjoint Unions of Testable Properties! General Testing Dimension! 25

Why a General Theory?! These are only a few among! many possible testing problems! We don t want to solve each! problem one-at-a-time! It will be good have some general theory: distinguish easily-testable vs hard-to-test problems! 26

General Testing Dim! Testing dim characterize (up to constant factors) the intrinsic #label requests needed to test the given property w.r.t. the given distribution! All our lower bounds are proved via testing dim! 27

Minimax Argument! min Alg max f P(Alg mistaken) = max π0 min Alg P(Alg mistaken)! wolg, π 0 = π + (1- ) π, π 0,π! Let π S, π S be induced distributions on labels of S.! For a given π 0,! min Alg P(Alg makes mistake S) 1-dist S (π, π )! 28

! Passive Testing Dim! Define d passive largest q in N, s.t.! Theorem: Sample Complexity of passive testing is Θ(d passive ).! 29

Coarse Active Testing Dim! Define d coarse as the largest q in N, s.t.! Theorem: If d coarse = O(1) the active testing of P can be done with O(1) queries, and if d coarse = (1) then it cannot.! 30

Active Testing Dim! Fair(π,π,U): distri. of labeled (y; l): w.p. choose! y~π U, l= 1; w.p. choose y~π U, l= 0.! err*(h; P): err of optimal fn in H w.r.t data drawn! from distri. P over labeled egs.! Given u=poly(n) unlabeled egs, d active (u): largest! q in N s.t.!!! Theorem: Active testing w/ failure prob 1/8! using u unlabeled egs needs Ω(d active (u)) label queries; can be done w/ O(u) unlabeled egs and O(d active (u)) label queries! 31

Testing Dim: A Powerful Notion! We use testing dimension to prove LBs for testing union of intervals and ltfs.! Lemma A. Let π Π 0, π Π ε. Fix U X to be a set of allowable queries. Suppose any S U,! with S = q, there is a set E S R q (possibly empty) satisfying π S (E S ) 2 -q /5 s.t.!! Then any q-query tester has large prob of making mistake.! 32

Application: Dictator fns Theorem: Active testing of dictatorships under the uniform distribution requires (log n) queries. This holds even for distinguishing dictators from random functions.! Any class that contains dictator functions requires (log n) queries to test in the active model, including decision trees, functions of low Fourier degree, juntas, DNFs, etc.! 33

!!!!! Application: Dictator fns (cont.) π and π uniform over dictator fns &over all boolean fns! S: a set of q vectors in {0,1} n : a qxn boolean matrix.! c 1 (S),,c n (S) : the cols of this matrix! Set S of q vectors chosen unif indep at random from! {0, 1} n. For any y in {0, 1} n, E[# cols of S equal to y] = n2 -q. Cols are drawn indep. at random,! 1 0 0 0 by Chernoff bounds:! 0 1 0 0!! 0 0 1 0 Now apply Lemma A.! 0 0 0 0 0 0 1 0 34 0 0 0 0 1

! Application: LTFs! Theorem. For LTFs under the standard n-dim Gaussian distrib, d passive = Ω((n/logn) 1/2 ) and active QC = Ω((n/logn) 1/3 ).! - π: distrib over LTFs obtained by choosing w~n(0, I nxn ) and outputting f(x) = sgn(w x).! - π : uniform distrib over all functions.! - Obtain d passive :bound tvd(distrib of Xw/ n, N(0, I nxn )).! - Obtain active QC: similar to dictator LB but rely on! strong concentration bounds on spectrum of random matrices! 35

Open Problem! Matching lb/ub for active testing LTF: n! Tolerant Testing /2 vs. (UINTd, LTF)! 36

Thanks! 37