Testing Complexity of Program Segments. McGill University University Street. Duke University

Using Vapnik-Chervonenkis Dimension to Analyze the Testing Complexity of Program Segments Kathleen Romanik æ School of Computer Science McGill University 3480 University Street Montreal, Quebec, Canada H3A 2A7 Jeærey Scott Vitter y Department of Computer Science Duke University Durham, NC 27708-0129 æ Part of this research was done while the ærst author was a graduate student at the University of Maryland at College Park. The research was supported in part by NSF Grant CCR-9112976, ONR Grant N00014-92- J-1254, NSERC, and IRIS National Network of Centres of Excellence. y Supported in part by NSF Grant CCR-9007851. 1

Testing Complexity of Program Segments Kathleen Romanik DIMACS Center for Discrete Mathematics and Theoretical Computer Science P.O. Box 1179 Rutgers, The State University of New Jersey Piscataway, NJ 08855-1179 2

Abstract: We examine the complexity of testing diæerent program constructs. We do this by deæning a measure of testing complexity known as VCP-dimension, which is similar to the Vapnik-Chervonenkis dimension, and applying it to classes of programs, where all programs in a class share the same syntactic structure. VCP-dimension gives bounds on the number of test points needed to determine that a program is approximately correct, so by studying it for a class of programs we gain insight into the diæculty of testing the program construct represented by the class. We investigate the VCP-dimension of straight line code, if-thenelse statements, and for loops. We also compare the VCP-dimension of nested and sequential if-then-else statements as well as that of two types of for loops with embedded if-then-else statements. Finally, we perform an empirical study to estimate the expected complexity of straight line code. 3

1 Introduction Program testing is an important subæeld of the æeld of software engineering. Much work has been done in ænding methods for selecting test data ëgg75, MH81, DMMP87ë and in evaluating diæerent testing methodologies ëbud81, DN84, BS87, Ham89ë. A related area of software engineering is the study of software complexity. Considerable research has been done in this area as well to devise software complexity measures ëhal77, McC76, WHH79ë and to compare various measures for their eæectiveness ëwey88, Tia92, TZ92ë. Our work combines these two areas by looking at the ëtesting complexity" of diæerent classes of programs. The reason that we study classes of programs is that by examining the complexity of a class of programs, where each program in the class has the same syntactic structure, we gain insight into the testing complexity of the syntactic structure that these programs share. Therefore, given a class of programs, each with the same syntactic structure, we investigate how diæcult it is to distinguish one program in the class from the others using only inputèoutput test pairs. Usually this is impossible to do with 100è accuracy. In other words, for most classes of programs it is impossible to distinguish one program in the class from all other programs that compute a diæerent function when only a ænite number of inputèoutput test pairs is used to test the program. For this reason we introduce another measure of testing complexity. In the æeld of computational learning theory ëvc71, BEHW89ë Vapnik-Chervonenkis dimension èor VCdimensionè characterizes the complexity of a class of objects to be learned. We deæne a similar notion of dimension for program classes that will give an indication of the testing complexity of these classes. Using this notion we can compare the testing complexity of diæerent classes of programs and gain insight into how diæcult it is to test various program constructs as well as determine how the complexity increases when program constructs are combined in diæerent ways. This insight is important because it tells the programmer which types of program structures lead to more easily testable programs, and it shows the tester where more concentrated testing eæorts should be applied. 2 Measuring Testing Complexity When we examine the testing complexity of classes of programs, we consider only a subset of programs that compute total recursive functions from the rationals èqè to the rationals. However, we feel that this subset is suæcient to provide insight into the relative testing complexity of diæerent program constructs. 4

Deænition. A program p computes a function f p : Q! Q, where a probability measure M is deæned over the set of inputs. A program class is a set P of programs. We will use p to denote both the program and the function f p that it computes when it is clear from the context which meaning is being used. Programs are deæned to compute functions with domain Q because the ænite representation of numbers in the computer only allows rationals. However, sometimes it is necessary to consider an extension of such a function to the reals, and in this case we will use the natural extension. The probability measure M is usually taken to be either the operational distribution on the inputs to the program, or a uniform distribution. In the former case, the error of a program èthat is, the probability that an input chosen at random according to M will produce an incorrect outputè is a measure of its unreliability; in the latter case, the error measures the fraction of the input domain for which the program computes an incorrect answer. The probability measure M can either be a discrete probability measure on Q or a continuous probability measure on the reals, R. When deæning a measure of testing complexity for a class of programs, we would ideally like a measure that can tell us how many test points èthat is, inputèoutput pairsè are needed to distinguish one program from all other programs in the class. We can deæne this notion formally as follows: Deænition. Given a program class P and a program p 2 P,atest set for p with respect to P is a set of inputs T ç Q such that for all other programs q 2 P,ifpèxè=qèxè for all x 2 T, then pèxè =qèxè for all x 2 Q èthat is, p and q compute the same functionè. The testing complexity of the class P is the smallest integer k such that any program p 2 P has a test set of cardinality k. The following results about testing complexity can be proven easily. Proposition 2.1. The testing complexity of a program class containing n programs is less than or equal to n, 1. Proof. Let P be a class of n programs, and let p 2 P be given. For each q 2 P that computes a diæerent function than p, choose an input x for the test set of p such that pèxè 6= qèxè. There will be at most n, 1 such test points. ut Proposition 2.2. The class of all programs computing polynomials of degree no greater than n has testing complexity n +1. 5

Proof. Since n + 1 points completely determine an n degree polynomial, any program in this class has a test set of size n +1. ut Although this deænition of testing complexity yields some results, it does not allow us to compare diæerent program constructs. This is because when more complicated program constructs are used, even very simple programs become impossible to ëtest", as the following example illustrates. Example 2.1. Let the program class P be deæned by the following schema: P := fp j pèxè :=ifa 1 æ 1 b 1 then outputèa 2 æ 2 b 2 è else outputèa 3 æ 3 b 3 èg; where æ 1 2f=;6=;é;é;ç;çg and æ 2 ; æ 3 2f+;,;æ;æg and a i ;b i 2fxgëQ Now, consider the problem of selecting a test set for the program q 2 P deæned as follows: qèxè :=if0é3 then outputèx +2è else outputèx, 3è: Since the boolean expression in the if-then-else statement is always true, program q simply computes the linear function x + 2. Although this may seem like a contrived program since the boolean expression is obviously always true and one branch is never executed, real programs with more complicated branching structures can contain unexecutable paths that are not easily detected. Barzdin, Bicevskis and Kalninsh ëbbk77ë proved that the problem of determining which branches of a program are realizable èa branch is realizable if some input causes the branch to be taken during execution of the programè is undecidable. Proposition 2.3. The program q has no ænite test set with respect to the class P. Proof. Suppose a ænite test set T for q existed. Let m := maxfx j x 2 Tg. Deæne q 0 èxè :=ifxçmthen outputèx +2è else outputèx, 2è: 6

Program q 0 is in P since it has the syntax speciæed by the class. It computes the same output as q for all inputs in T.However, it computes an incorrect value for all xém,sot is not a test set for q. ut The program q has no test set with respect to P because it does not make use of both branches of the if-then-else statement. Thus, the unused branch can be used by another program in P to ëtrick" any supposed test set. Such a program looks like q on all inputs in the test set, but it diverges from q on inputs greater than those in the test set. The example above demonstrates that it is impossible to test most programs to within 100è accuracy, even when they are tested with respect to simple program classes. Since absolute testing is an impossible task, we propose in the next section a less absolute but more meaningful measure of testing complexity. This measure indicates when it is possible to use a small number of randomly selected test points to determine whether any program in a given class is approximately correct with high probability. A program is approximately correct if on the average it computes a value ëclose to" the value of a program that is absolutely correct. These ideas will be made more formal in the next section. We emphasize that we are not proposing random testing as an eæective testing methodology, but rather we are using it as a basis for comparing the testing complexity of diæerent program constructs and combinations of constructs. However, other work has been done recently ëbk89, BLR90, Lip91, GLR + 91ë to make random testing a viable approach to testing software. The problem that is addressed in this other work is how to convert a program that has been shown through random testing to be correct for most inputs into a program that is correct with high probability on all inputs. The following simple example illustrates this idea. Example 2.2. Suppose we are given a black box program p that computes the function fèxè =x. Suppose that we have performed a suæcient number of random tests on p to ensure that it is correct for most inputs. The following program q calls p as a subroutine and uses a random number generator to compute the function f. Ifpis correct for most inputs, then q computes f correctly on all inputs with high probability. It is assumed that q has access to fault-free addition and subtraction operators. qèxè := begin end y =random; z = pèx + yè, pèyè; outputèzè 7

3 Random Approximate Testing In order to deæne a meaningful measure of testing complexity, we deæne a model for random approximate testing of programs and relate a measure to this model. First we deæne the notion of error for a program. Deænition. Given a program class P and a program p 2 P, the error of a program q 2 P with respect to p and the probability measure M is deæned by Z E M èq; pè= jqèxè,pèxèjdmèxè: R In this deænition p represents a speciæcation, and q represents a program to be tested against the speciæcation. The error of a program is the expected diæerence in value between its output and the speciæcation for a randomly drawn input. If this error is bounded by æ, then q is approximately correct with respect to p for error bound æ. Note that the error function E M is a pseudo-metric 1 on P. When the probability measure M is clear from context, we just use E for the error function. Now we deæne what it means for a class of programs to be randomly approximately testable. Let I denote the open interval of rationals è0; 1è, and let m: I æ I! Z + be a positive integer valued function deæned on I æ I. Let P be a class of programs computing functions from Q to Q, and let M be a probability measure on Q. Deænition. P is randomly approximately testable èw.r.t. M è with test set size mèæ; æèif for all æ; æ 2 I and for all p 2 P, if a set T of mèæ; æè inputs is selected at random from Q according to M, then with probability at least 1, æ for all q 2 P,ifpèxè=qèxè for all x 2 T èthat is, if q is consistent with p on T è, then E M èq; pè ç æ. If the class P is randomly approximately testable, then given a speciæcation p 2 P,a conædence parameter æ 2 I, and an error bound æ 2 I, a ænite number of random test points can be selected, and with high probability these test points will ensure that any program that tests correctly on these points will be approximately correct. If the function m is a polynomial in 1 æ and æ 1, then the class P is polynomially randomly approximately testable or just polynomially testable. 3.1 A Complexity Measure for Random Approximate Testing Now that we have deæned a model for random approximate testing of programs, we must ænd a complexity measure that relates to this model. In computational learning theory ëvc71, 1 A pseudo-metric on a set P is a function E : P æ P! R + such that for all x; y; z 2 P the following three properties hold: è1è x = y è Eèx; yè =0,è2èEèx; yè =Eèy; xè, è3è Eèx; yè+eèy; zè ç Eèx; zè. 8

BEHW89ë Vapnik-Chervonenkis dimension èor VC-dimensionè characterizes the complexity of a class of objects to be PAC 2 learned. If this dimension is ænite, then any object in the class can be learned with high probability. This means that if a small number of random examples are selected and labeled according to a chosen object, then with high probability any other object in the class that is consistent with the chosen object on the examples will be approximately equal to that object. Since our model for random approximate testing is similar to the model of ëbehw89ë for PAC learning, we introduce a notion of dimension similar to VC-dimension for program classes that gives an indication of the testing complexity of these classes. This dimension allows us to determine when a small number of test points can be used to demonstrate that a program is approximately correct. In order to deæne this dimension we ærst identify a program with the set of intervals for which it evaluates to a positive number. These notions are formalized below. Deænition. Given a program class P, a ænite set of inputs T ç Q is shattered by P if for all S ç T, there exists p 2 P such that pèxè é 0 for all x 2 S and pèxè ç 0 for all x 2 T,S. The Vapnik-Chervonenkis program dimension of P èor simply VCP-dimensionèP èè is the largest integer k such that there exists a subset T of Q of cardinality k that is shattered by P.Ifno such k exists, the VCP-dimension of P is inænite. 3.2 VCP versus Pseudo Dimension A similar measure to VCP-dimension, called pseudo dimension, has been deæned by Haussler ëhau92ë. He has shown that if this dimension is ësmall" for a set of functions, then the set of functions can be PAC learned. We deæne this measure now and compare it to VCP-dimension. Deænition. èëhau92ëè For a family of functions F from a set S into R the pseudo dimension of F èor dimèf èè is the largest k such that there exist two sequences of length k, çx = èx 1 ;:::;x k è2 S k and ç t =èt1 ;:::;t k è2 R k, and for any subsequence çy =èx i1 ;:::;x i j èofçx, there exists f 2 F such that fèx i è+t i é0 for all x i 2 çy and fèx i è+t i ç0 for all x i 2 çx, çy èthat is, elements of the sequence çx that are not in the subsequence çyè. If no such k exists, then dimèf è is inænite. The pseudo dimension of a class of functions is the size of the largest set of inputs that can be shattered by the class using a vector èç tè of translation values. Pseudo dimension diæers from VCP-dimension because it allows for the possibility of translating a set of functions by a vector of constants before shattering a set of points. It is easy to see that for any class of functions F,VCP-dimensionèF è ç dimèf è since a 2 PAC stands for probably almost correct. 9

set of inputs that is shattered by F is still shattered if the vector of outputs produced by these inputs is translated by the vector 0 k. If arbitrary classes of functions are considered, then an inverse bound does not exist. That is, there are classes of functions with ænite VCPdimension and inænite pseudo dimension. For example, the class of all positive functions has VCP-dimension 0, but it has inænite pseudo dimension. Although pseudo dimension can be arbitrarily greater than VCP-dimension, Macintyre and Sontag ëms93ë have observed that given a class of functions F : S! R, a new class of functions F 0 : SæR! R can be created with the property that dimèf è ç VCP-dimensionèF 0 è èin fact, equality holdsè. This new class of functions F 0 is created from F by replacing each f 2 F with the new function fèxè, y, which takes two inputs x 2 S and y 2 R. Using the above observation of Macintyre and Sontag, the upper bounds that we obtain in this paper for the VCP-dimension of various classes of programs are the same as those that can be obtained for the pseudo dimension of these classes. This is because the upper bounds on the VCP-dimension still hold if we convert each program class P into a program class P 0, where each p 0 2 P 0 takes two inputs x and y and computes pèxè, y for some p 2 P. Therefore, the results of Haussler using pseudo dimension that we employ next to bound the number of test points needed to randomly approximately test a program are valid for the classes we consider. Haussler ëhau92ë has used the notion of pseudo dimension to get a bound on the number of random examples needed to PAC learn a function from a class of functions. These results can also be used to ænd a bound on the number of random test points needed to approximately test a program from a class of programs. This is because a random sample of inputèoutput pairs for a function that gives suæcient information to infer the rest of the function is also suæcient to determine that a program correct on the sample has small overall error. We now illustrate this relationship between PAC learning and random approximate testing in more detail. First we deæne several notions used in Haussler's result. Deænition. èëhau92ëè Let F be a family of functions from S = R æ R into ë0;kë. Let M be a probability measure on S. For any f 2 F the expected value Eèfè off on an example chosen at random from S according to M is R S fèxèdmèxè. The empirical estimate of this expected value on a random sample çx 2 S m is ^Eçx èfè := mp 1 mi=1 fèx i è. The ëcloseness" of these two values is given by the metric d ç, where çé0, deæned by d ç èx; yè:= jx,yj for any ç+x+y non-negative reals x and y. Haussler gives the following result relating ænite pseudo dimension to PAC learning of functions. This theorem states that for a family of functions F with codomain ë0;kë and with ænite pseudo dimension, to ensure that with high probability the empirical estimate of the expected value of any function in F on a random sample is ëclose to" its actual expected value, it is only necessary to choose a random sample whose size is a polynomial in the pseudo 10

dimension of F, the bound K on the codomain of F, and several approximation parameters. The notion of a permissible family of functions used in the theorem is a measurability assumption that must be made when the family F is uncountable. Since the programs we examine only use rational constants, they compute countable classes of functions, so we do not need to be concerned with this notion. Theorem 3.1. ëhau92ë Let F be apermissible family of functions from S = R ær into ë0;kë with dimèf è=d é 1.Let M be a probability measure on S.If a random sample çx of length m is drawn from S according to M, and if m ç 8K æ 2 ç ç 2d ln 8eK æç +ln8 æ for 0 éæ;æé1and for 0 éçç8k, then the probability that there exists f 2 F such that d ç è ^E çx èfè;eèfèè éæis at most æ. ç This theorem applies to PAC learning when the class F is deæned to be the class of loss functions associated with a class F of functions from R to R used to estimate or ëlearn" an unknown distribution of inputèoutput examples. A loss function L f for a function f measures, for each input to f, the error or ëloss" of f on that input. In this case the loss function is deæned by L f èx; yè =jfèxè,yj; that is, it measures how much the value given by f diæers from a given y for a given input x. If the class of loss functions associated with F has ænite pseudo dimension, then it is possible to use a small random sample to choose a function in F that has a small loss èthat is, its error is close to the inæmum of errors over all functions in Fè with respect to the unknown distribution of examples. This function will give a good representation of the unknown distribution of examples. We can also apply Haussler's result to the problem of testing. When testing programs we would like to use a small set of test points to detect programs that vary greatly from a given speciæcation. In particular, we would like to be able to say that any program that is correct on a small set of test points computes a function that is approximately correct with respect to the speciæcation. In order to apply Haussler's result to testing programs, we must use a class of functions with codomain ë0;kë. We do this by choosing a maximum loss value K and deæning a class of loss functions corresponding to the class of programs P. Since we test with respect to a speciæcation function p 2 P,wecan deæne the loss function L K p;q for a tested program q èw.r.t. pè by è jpèxè,qèxèj if jpèxè, qèxèj L K p;q èxè := ék K otherwise When we deæne the loss function in this way, the error of a tested program corresponds to the expected value of its loss function, assuming that the error of a program on a particular input is bounded above by K. 11

Haussler's work diæers from our approach since he uses probability measures deæned on R æ R, rather than just on the domain R of the class of functions F. However, our work can be put into his framework by deæning the following probability measure M p on R æ R corresponding to the probability measure M deæned on Q and the speciæcation program p: M p èx; yè:= è Mèxè if pèxè =y 0 otherwise M p is a probability measure that is 0 everywhere on R æ R except on the graph of p, and its marginal on R is M. By making these two adjustments, we can prove a result that is similar to Haussler's and apply it to the testing problem. First we relate the pseudo dimension of the class of loss functions associated with a given program p 2 P to that of the class P. Lemma 3.1. Let P be a class of programs computing functions from Q to Q. For any p 2 P and K 2 Q +, the class of loss functions L K p := flk p;q j q 2 Pg associated with p is a subset ç L K p;+ + L K p;,, such that dimèl K p;+è ç dimèp è and of the sum of two classes of functions L K p dimèl K p;,è ç dimèp è. Proof. We show this in three steps. First, by a result of Wenocur and Dudley ëwd81ë, for any p 2 P,ifwe deæne a new class of functions L p := fq, p j q 2 Pg, then dimèl p è = dimèp è. This is easy to see. If L p shatters çx using the translation vector ç t, then P shatters çx using the translation vector ç t 0, where t 0 i = t i, pèx i è. Second, for any program q 2 L p, the programs q + and q,, computing the positive and negative part of q respectively, can be deæned as follows: è è qèxè if qèxè é 0 qèxè if qèxè é 0 q + èxè := q 0 otherwise, èxè := 0 otherwise The classes L p;+ := fq + j q 2 L p g and L p;, := fq, j q 2 L p g have the property that dimèl p;+ è ç dimèl p è and dimèl p;, è ç dimèl p è. Suppose the subset L 0 p;, of L p;, shatters çx using the translation vector ç t. Then çx is also shattered by Lp using the same ç t. If there exists x i 2 çx and q, 2 L 0 p;, such that q, èx i è = 0 and q, èx i è+t i é 0, then qèx i è+t i é 0. On the other hand, if there exists x i 2 çx and q, 2 L 0 p;, such that q, èx i è = 0 and q, èx i è+t i ç 0, then çx is not shattered by L 0 p;, since every q, 2 L 0 p;, has q,èx i è+t i ç 0. A similar argument can be made for L p;+. Third, for the classes L p;+ and L p;, and for K 2 Q + we can deæne the classes L K p;+ := fq K j q 2 L p;+ g and L K p;, := f,q K j q 2 L p;, g, where q K èxè =qèxèif,kéqèxèék, q K èxè=kif qèxè ç K, and q K èxè =,Kif qèxè ç,k. For both of these classes the pseudo dimension is no more than that of L p using the same argument asabove. 12

It is easy to see that L K p ç L K p;+ + L K p;,. ut To prove our result on testing programs we use the following notions and theorems from the literature. Deænition. èëpol84, Hau92ëè Let F be a family of functions from a set S into R and let M be a probability measure on S. For æé0, the covering number N èæ; F èoff is deæned as the smallest m for which there exist functions g 1 ;:::g m ènot necessarily in F è such that for all f 2 F there is a g i with E M èf; g i è ç æ. The æ-separation number Mèæ; F èoff is deæned as the largest m for which there exists a set H ç F of functions of cardinality m such that for all distinct h i ;h j 2 H, E M èh i ;h j è éæ. The family F has a nonnegative envelope f if fèxè çjfèxèjfor all f 2 F. Theorem 3.2. ëhau92ë Let F be a family of functions with envelope f. Then for any æé0, Mè2æ;FèçNèæ; F è çmèæ; F è: Theorem 3.3. ëpol84, Hau92ë Let F be a family of functions from a set S into ë0;kë, where dimèf è=dé1.let M be aprobability measure ons. Then for all 0 éæçk, ç 2eK Mèæ; F è é 2 æ ln 2eK ç d : æ Theorem 3.4. ëpol86ë Let F be apermissible family of functions from a set S into ë0;kë, and let M be a probability measure on S. Assume ç é 0;0é æ é 1, and m ç 1. Suppose that çx 2 S m is generated bymindependent random draws from S according to M. Then the probability that there exists f 2 F such that d ç è ^Eçx èfè;eèfèè éæis at most 4EèN èæç=8;f jçx èèe,æ2 çm=16k : where E is expected value and F jçx is the restriction of F to çx; that is, F jçx = fèfèx 1 è;:::;fèx m èè j f 2 Fg. Theorem 3.5. ënp87ë If F and G are families of functions with envelopes f and g, then the class with envelope f+g, satisæes F + G := ff + g j f 2 F; g 2 Gg; N è2æ 1 +2æ 2 ;F +GèçNèæ 1 ;FèNèæ 2 ;Gè: 13

We use the above theorems to prove a result for testing programs that is similar to Haussler's result for learning functions. Theorem 3.6. Let P be a class of programs computing functions from Q to Q with dimèp è= dé1, and let M be aprobability measure deæned over the set of inputs Q. Given p 2 P and K 2 Q +, deæne the loss function for any q 2 P to be L K p;q and deæne LK p := flk p;q j q 2 Pg. Also, deæne the probability measure M p on R æ R as above. If a random sequence of test points çx of length m is drawn from Q according to M, and if m ç 16K ç æ 2 4d ln 32eK +ln 16 ç ç æç æ for 0 éæ;æé1and for 0 éçç8k, then the probability that there exists f 2 L K p d ç è ^E çx èfè;eèfèè éæis at most æ. such that Proof. We follow the form of Haussler's proof. By Theorem 3.4., if a random sequence çx of m test points is selected, then the probability that there exists f 2 L K p such that d ç è ^Eçx èfè;eèfèè éæis at most 4EèN èæç=8;l K çm=16k p;jçx èèe,æ2 : Since L K p ç L K p;+ + L K p;, by Lemma 3.1., and using Theorem 3.5., this is at most 4EèN èæç=32; L K p;+;jçx èn èæç=32; LK p;,;jçx èèe,æ2 çm=16k : By Theorems 3.2. and 3.3. and Lemma 3.1. this is at most ç 64eK 16 æç ln 64eK ç 2d e,æ2çm=16k : æç Setting the above bound equal to æ and solving for m gives m ç 16K ç ç ç 64eK 64eK 2d ln ln æ 2 ç æç æç +ln 16 æ ç : Simplifying this expression using the fact that lnèa ln aè é 2 lnèa=2è when a ç 5 gives the ænal expression. ut In the previous theorem, the program p 2 P that is chosen to determine the loss functions and probability measure represents a speciæcation against which other programs must be tested. As stated earlier, the loss function for a tested program q computes its error with 14

respect to the speciæcation p. The empirical estimate ^Eçx èl K p;qè for the loss function of q represents the observed error of q on the test sequence. The expected value EèL K p;q è represents the actual error of q with respect to p. The theorem states that if P has ænite pseudo dimension, then with high probability these two measures of error will be close for all programs in P. With respect to testing this means that any program q that is consistent with p on the test set will have small error for appropriately chosen æ; æ and ç. For example, if æ = 1, 100 æ = 1, and ç = æ and if the number 2 of test points speciæed in the theorem are chosen at random, then with 99è probability any program in P that is consistent with p on these test points will have error less than æ. This is because with 99è probability no program in P will have an error that diæers by more than 1 from its observed error on the random test sequence, when 2 d æ is used to measure this diæerence. For a program q that is consistent with p on the test sequence çx, ^Eçx èl K p;qè =0. Therefore, in order for dæè ^Eçx èl K p;qè;eèl K p;qèè = j ^Eçx èl K p;qè, EèL K p;qèj æ + ^E çx èl K p;qè+eèl K p;qè = to be less than 1 2, the actual error of q, EèLK p;qè, must be less than æ. EèLK p;q è æ + EèL K p;qè In this section we have discussed the relationship between VCP-dimension and pseudo dimension and have shown how pseudo dimension can be used to determine the number of random test points needed to approximately test a program. These dimensions also give an intuitive measure of the testing complexity of a program class, so they can be used to compare the complexity of diæerent program classes. In the following sections we investigate the VCP-dimension of diæerent classes of program segments. 4 Testing Straight Line Programs First we consider the case of testing straight line programs. We deæne P n, the class of straight line programs with n lines of computing code and one output line, as follows: Deænition. Program class P n is deæned by the following schema: P n := fp j pèxè:=y 1 =a 1 æ 1 b 1 ;. y n =a n æ n b n ; outputèy n èg; where æ i 2f+;,;æg and a i ;b i 2 Qëfxgëfy j jjéig: 15

Each line of code in a program from P n either adds, subtracts or multiplies a constant, the input x, or a previous expression to either a constant, the input x, or a previous expression. Thus each line of code uses two operands that are polynomials. Since the set of polynomials with rational coeæcients is a ring, and rings are closed under addition, subtraction, and multiplication, each p 2 P n computes a polynomial over the rationals. Deænition. The class F n of functions computed by P n can be deæned inductively as follows: F 0 := fxgëq F n := ffèhè æ gèhè j f 2 F i ;g2f j ;h2f k ;æ2f+;,;æg;i+j+kçn,1g; where the meaning of fèhè for f 2 F i ;h 2 F k is that function f performs i elementary operations èfrom the set f+;,; ægè using the operands x; h, and any c 2 Q. A function in F n is built up from smaller functions fèhè and gèhè, which havehas their largest common subexpression. The two functions f èhè and gèhè are joined together in the last computing step of a program in P n. Obviously, F n ç F n+k for all k ç 0 since any function in F n can be realized by a program in P n+k that has k lines of code that are not used in computing the ænal expression. 4.1 Lower Bounds on VCP-Dimension of Straight Line Code Before investigating the VCP-dimension of the class P n,we make a few observations. For a class P of polynomials to have VCP-dimension k, itmust contain a program p that changes sign at least k,1 times. Therefore, p must compute a polynomial of degree at least k,1 that has at least k, 1 distinct real zeros. Using Horner's method ëbaa88ë itisknown that any degree k polynomial can be computed with 2k elementary operations èaddition, subtraction, multiplicationè. Therefore VCP-dimensionèP n è is at least b n c+1. Other work has been done 2 ërs72ë to get the number of operations for evaluating a degree k polynomial down to 3k, 2 which means that VCP-dimensionèP n èiseven higher. On the other hand, Borodin and Cook ëbc76ëhave demonstrated a polynomial with 3 bn=3c real, distinct zeros that can be computed with n operations, so just examining the largest number of real, distinct zeros that can occur in a polynomial computed by a program in P n is not suæcient to obtain a good bound on the VCP-dimension. Since Borodin and Cook ëbc76ë have also proven that ëmost" polynomials of degree greater than or equal to èn +2è 2 cannot be computed with n æ operations, even when an unbounded number of multiplication operations are allowed, it appears that VCP-dimensionèP n è is no more than Oèn 2 è. In fact, in the next section we will prove an upper bound that is close to this one. We now examine VCP-dimensionèP n è for some small values of n to get a feel for how the dimension grows with the size of the programs. As a base case, programs in P 0 have no 16

computing lines and can only output a constant or the input x, sovcp-dimensionèp 0 è=1. The class P 1 contains programs with one computing line and can be enumerated by P 1 = fk; kx; x + k; x, k; k, x; x 2 g, where k 2 Q. It is easy to see that VCP-dimensionèP 1 è=2. Proposition 4.1. VCP-dimensionèP 2 è=3. Proof. First we show VCP-dimensionèP 2 è ç 3by showing that the set T = f,2; 0; 2g of 3 points is shattered by P 2. We represent a subset of T by an ordered list of + and, signs where a + sign in the i th position means that the i th smallest element oftis in the subset. So, for example, è+;,; +è represents the subset f,2; 2g of T. The following list shows each subset of T along with a polynomial computed by a program in P 2 that obtains that subset. Each subset in the left column begins with a, sign, and the corresponding subset in the right column is the negation of this subset, so the polynomial that obtains it is the negation of the polynomial in the left column. Subsets in the left column are ordered by increasing number of + signs. è,;,;,è,3 è+; +; +è 3 è,;,; +è x, 1 è+; +;,è 1, x è,; +;,è 2, x 2 è+;,; +è x 2, 2 è,; +; +è x +1 è+;,;,è,1, x Now we show VCP-dimensionèP 2 è ç 3. In order for VCP-dimensionèP 2 è to be greater than 3, it must be possible to shatter a set of points of size at least 4 with P 2. Therefore, P 2 must contain a program that computes a function that obtains the subset è+;,; +;,è for some set of 4 points. However, this can only be done with a polynomial that has at least 3 distinct real zeros, and the only polynomials of degree greater than 2 in P 2 are x 3 and x 4, which only have one real zero. So P 2 cannot shatter a set with more than 3 points. Since VCP-dimensionèP 2 è ç 3 and VCP-dimensionèP 2 è ç 3, VCP-dimensionèP 2 è = 3. ut Proposition 4.2. VCP-dimensionèP 3 è=4. Proof. First we show VCP-dimensionèP 3 è ç 4by showing that the set T = f,3;,1; 1; 3g of 4 points is shattered by P 3. The following list shows each subset of T along with a polynomial computed by a program in P 3 that obtains that subset. As before, subsets beginning with a 17

, sign are in the left column. è,;,;,;,è,3 è+; +; +; +è 3 è,;,;,; +è x, 2 è+; +; +;,è 2, x è,;,; +;,è 1, èx, 1è 2 è+; +;,; +è èx, 1è 2, 1 è,; +;,;,è 1, èx +1è 2 è+;,; +; +è èx +1è 2,1 è,;,;+;+è x è+; +;,;,è,x è,; +;,; +è x 3, 4x è+;,; +;,è 4x, x 3 è,; +; +;,è 2, x 2 è+;,;,; +è x 2, 2 è,; +; +; +è x +2 è+;,;,;,è,2, x Now we show VCP-dimensionèP 3 è ç 4. In order for VCP-dimensionèP 3 è to be greater than 4, P 3 must contain a program that computes a polynomial that has at least 4 distinct real zeros. However, the only polynomials of degree greater than 3 in P 3 are kx 4 ;x 4 +k; èx 2 + kxè 2 ; èx + kè 4 ; èx 2 + kè 2 ;x 4 +kx 2 ;x 4 +x 3 ;x 4 +x; x 5 ;x 6 ;x 8 and none of these have 4 distinct real zeros, so P 3 cannot shatter a set with more than 4 points. ut It can similarly be proven that VCP-dimensionèP 4 è=5. From the above examples we could conjecture that VCP-dimensionèP n è= n+1. This is, in fact, a lower bound for VCP-dimensionèP n è, but as n increases it is no longer an upper bound, as the following two theorems demonstrate. Theorem 4.1. VCP-dimensionèP n è ç n +1. Proof. First we demonstrate how to build an n degree polynomial with n distinct real zeros using n operations, and then we show how a set of n + 1 points can be shattered. If c 1 ;c 2 ;:::c b n are distinct positive rationals, then the following polynomial has 2 c n distinct real zeros èif n is odd, a factor of x is insertedè: fèxè =èx 2,c 2 1èèx 2, c 2 2è :::èx 2,c 2 b n 2 cè This polynomial can be computed in n operations with the following program: pèxè := y 1 =xæx; y 2 =y 1,c 2 1; y 3 =y 1,c 2 2 ; y 4 =y 2 æy 3 ;. y n =xæy n,1 ; èif n is oddè y n = y n,2 æ y n,1 ; èif n is evenè outputèy n è 18

The program p builds b nc degree 2 polynomials, 2 x2, c 2 i, each with two distinct real zeros, c i and,c i, and multiplies these together. If n is odd, p multiplies this polynomial by x to make 0 the n th zero. A set T of n + 1 points can be chosen by choosing a point between every two zeros of f, as well as a point at either end of the zeros of f. The polynomial f obtains one subset of T with n sign changes, and its negation, which can be obtained in n operations by negating one factor èthat is, by using èc 2 i, x 2 èè, obtains the other subset with n sign changes. Given any subset S of T with less than n sign changes, the following steps will give a polynomial computable in n operations which obtains that subset. 1. Build a polynomial similar to f, except skip the factor èx 2, c 2 i è if the points in T on either side of c i or,c i are assigned the same sign by S. This will ësave" two operations. If n is odd and the two points next to 0 are assigned the same sign by S, then remove the factor x. 2. For each c i for which the factor èx 2,c 2 i èwas skipped in step 1, if c i è,c i è has the points on either side of it assigned diæerent signs by S, then place a line through it. That is, insert the additional factor c i, x è,c i, xè into the polynomial. This additional factor will change the sign of all points greater than c i ègreater than,c i è. It requires two operations, the same number as were ësaved" by not inserting the factor èx 2, c 2 i è. 3. If no factors were added in either step 1 or step 2, then S contains no sign changes, so it is given by a constant polynomial. If the polynomial obtained by step 1 assigns a diæerent sign to the smallest point intthan S does, then it must be negated. This can be done by negating one of its factors. The polynomial created by the above steps has a zero between every two points of S where a sign change occurs and has no other zeros. Since each of these zeros actually passes through the x axis, and no zero is shared by two or more factors, the resulting polynomial actually changes sign at each zero. Since step 3 ensures that the smallest point intis assigned the correct sign by the polynomial, all other points will also be assigned the correct sign. ut Theorem 4.2. For né42, VCP-dimensionèP n è én+1. Proof. We demonstrate how, using several ëtricks", we can shatter a set of n+2 points, where né42. For n even, deæne T := f,èn +1è;,èn,1è;:::;,3;,1;1;3;:::;n,1;n+1gand for n odd deæne T := f,n;,èn, 2è;:::;,3;,1;0;1;3;:::;n,2;ng. Observe that any subset of T can be obtained by constructing a polynomial with exactly one zero èa zero where the 19

polynomial actually ëcrosses" the x axisè between each pair of consecutive points in T where a sign change occurs. Since the points chosen for T are symmetric around 0, any subset of T that has at least one pair of sign changes missing èthat is, any subset where there exists a c ç 3 such that,c and,èc, 2è are assigned the same sign and c and c, 2 are assigned the same signè can be obtained using the techniques in the proof of Theorem 4.1.. Also, any subset with three consecutive sign change pairs èthat is, any subset where there exists a cé6 such that,c;,èc, 2è;,èc, 4è;,èc, 6è and èc, 6è; èc, 4è; èc, 2è;c are assigned alternating signsè can be obtained, as the following claim shows. Claim 4.2.a. Any subset of T containing three consecutive sign change pairs can be obtained with n lines of computing code. Proof. Let S be a subset of T with three consecutive sign change pairs, and let c be the constant such that S assigns alternating signs to,c;,èc, 2è;,èc, 4è;,èc, 6è and to èc, 6è; èc, 4è; èc, 2è;c. Assume S assigns a + sign to,c. The following 5 lines of code will build a polynomial that evaluates to a positive number for,c;,èc, 4è; c, 4; c and evaluates to a negative number for,èc, 2è;,èc, 6è;c,6;c,2: y 1 = x æ x; y 2 = y 1, èc, 3è 2 ; y 3 = y 2 2 ; y 4 = y 3, k; y 5 = y 2 æ y 4 ; where k is chosen appropriately. An appropriate k is one that causes y 4 to be positive for,c;,èc, 6è;c,6;c and negative for,èc, 2è;,èc, 4è;c,4;c,2. In other words, such a kmust make the following inequalities hold: èc 2, èc, 3è 2 è 2, k =è6c,9è 2, k =36c 2,108c +81,ké0 èèc, 6è 2, èc, 3è 2 è 2, k =è,6c+ 27è 2, k =36c 2,324c + 729, ké0 èèc, 2è 2, èc, 3è 2 è 2, k =è2c,5è 2, k =4c 2,20c +25,ké0 èèc, 4è 2, èc, 3è 2 è 2, k =è,2c+7è 2,k=4c 2,28c +49,ké0 Akthat satisæes the above inequalities is one that is less than 36c 2, 108c + 81 and 36c 2, 324c+729 and greater than 4c 2,20c+25 and 4c 2,28c+49. Since for cé3;4c 2,20c+25 é 4c 2,28c + 49 and 36c 2, 108c +81é36c 2, 324c + 729, such aksatisæes 4c 2, 20c +25é ké36c 2, 324c + 729. There exists such akif 32c 2, 304c + 704 = 16è2c 2, 19c + 44è é 0. Since this inequality holds for all c ç 6, and the restriction on c is cé6, a k with the correct properties always exists. The above 5 lines of code produce a polynomial with 6 zeros è3 symmetric pairs of zerosè. In addition to the sign changes produced by these zeros, S contains a total of at most d n,5e 2 pairs of sign changes and single sign changes èa single sign change occurs when there exists a positive c such that c and c, 2 are assigned the same sign but,c and,èc, 2è are assigned diæerent signs, or visa versaè. Since each pair of sign changes can be obtained with 20

2 additional lines of code èby including the factor x 2, c i è, and each single sign change can be obtained with 2 additional lines of code èby including the factor x, c i è, at most 2d n,5e 2 additional lines of code are needed to obtain the rest of the sign changes in S. For n odd 2d n,5e = 2èn,5è = n, 5, so the total number of lines of code needed to obtain S is n. For 2 2 n even 2d n,5e = 2èn,4è = n, 4. However, one of the possible sign changes in S is at 0, and 2 2 this sign change can be obtained in one line by adding the factor x, so a total of n lines of code are needed to obtain S in this case too. ut Now wemust demonstrate how to obtain subsets of T that do not have any pairs of sign changes missing and do not contain three consecutive sign change pairs. We claim that such subsets must have one of the following two properties: 1. There exist 3 single sign changes k 1 ék 2 ék 3 èa sign change k i is deæned to be c, 1 where S assigns diæerent signs to c and c, 2è such that k 2, k 1 = k 3, k 2. 2. There exist 4 single sign changes k 1 ék 2 ék 3 ék 4 such that k 2, k 1 = k 4, k 3. A subset of T can be represented by a string of length n + 1 from the alphabet æ = f0; 1; 2g. A2inthei th position of such a string represents a sign change between the i th and the èi +1è st points of T that is one of a symmetric pair of sign changes. A 1 in the i th position represents a sign change between the i th and the èi +1è st points of T that is a single sign change. A0inthei th position represents that no sign change occurs in T between the i th and the èi +1è st points. By searching through longer and longer even length strings èfor n even, the possible sign change at 0 is omittedè it is found that all even length strings of length at least 44 that contain at least one sign change of any possible pair of sign changes and do not contain three consecutive sign change pairs satisfy at least one of the above two properties. ènote: A search program was written and executed to do this.è A string of length 42 that does not satisfy either of the two properties is the following: 221122021202202212122220202212212021220022. If a subset S of T has property 1, then the following 4 lines of code will build a polynomial that changes sign at k 1 ;k 2 and k 3 : y 1 = x, k 2 ; y 2 = y 2 1 ; y 3 = y 2, èk 3, k 2 è 2 ; y 4 = y 3 æ y 1 ; S will contain, in addition to these sign changes, a total of at most d n,5 e pairs of sign changes 2 and single sign changes. A polynomial can be built using n, 5 operations to obtain these sign changes, as explained in the proof of Claim 4.2.a. Using one more operation to multiply the two polynomials together, the subset S can be obtained with n operations. 21

If a subset S of T has property 2, then the following 5 lines of code, where k 5 = k 2 + k 3,k 2 2, will build a polynomial that changes sign at k 1 ;k 2 ;k 3 and k 4 : y 1 = x, k 5 ; y 2 = y 2 1 ; y 3 = y 2, èk 3, k 5 è 2 ; y 4 = y 2, èk 4, k 5 è 2 ; y 5 = y 3 æ y 4 ; In addition to these sign changes, S will contain a total of at most d n,7 e pairs of sign changes 2 and single sign changes, and these can be obtained with n, 7 operations. Therefore S can be obtained with n, 1 operations. We have demonstrated how any subset of T can be obtained using no more than n operations. Therefore, for né42, VCP-dimensionèP n è is at least n +2. ut 4.2 Upper Bounds on VCP-Dimension of Straight Line Code One way to obtain an upper bound on the VCP-dimension of P n is to determine the largest number of real, distinct zeros occurring in any polynomial computed by a program in P n. However, since we mentioned at the beginning of the previous subsection that there are programs in P n that compute polynomials with a number of real, distinct zeros that is exponential in n, this technique does not yield a good upper bound on the VCP-dimension of the class. To obtain better upper bounds we use recent results by Goldberg and Jerrum ëgj93ë in computational learning theory on bounding the VCP-dimension of classes parameterized by real numbers. We use the following theorem from their work. Theorem 4.3. ëgj93ë Let C be a concept class where concepts and instances are represented by k and n real values, respectively. Suppose that the test for membership of an instance x in a concept c consists of an algorithm A k;n taking k + n real inputs representing c and x, whose runtime is t = tèk; nè, and which returns the truth value x 2 c. The algorithm A k;n is allowed to perform conditional jumps èconditioned on equality and inequality of real valuesè and execute the standard arithmetic operations on real numbers è+;,; æ; =è in constant time. Then VC-DimensionèCè=Oèktè. In our terminology a concept is a program, and an instance is an input to the program. If we say that an instance x is in a program p èx 2 pè if and only if pèxè é 0, then the VCP-dimension of a program class is the same as the VC-dimension of a class deæned as in the above theorem. 22

Since we only examine programs taking one rational input, an instance can be represented by one real value. A program p 2 P n can be represented by 5nreal values by using 5 real numbers to encode the syntax of each line of the program. That is, line i of the program can be represented by parameters èp i;1 ;p i;2 ;p i;3 ;p i;4 ;p i;5 è, where p i;1 encodes the operator, p i;2 and p i;3 encode the ærst operand, and p i;4 and p i;5 encode the second operand. The encoding for p i;1 uses ë0" for ë+", ë1" for ë," and ë2" for ëæ". The encoding for p i;2 and p i;4 uses ë,1" for a constant, ë0" for the input x and ëj", where 1 ç j ç n, 1, for ëy j ". If the operand is a constant, then p i;3 or p i;5 is used to store the constant, otherwise this parameter is 0 and is not used. As an example, the following program in P 3 can be represented by p =è2;0;0;0;0;1;1;0;,1;6;0;2;0;0;0è: pèxè := y 1 =xæx; y 2 =y 1,6; y 3 = y 2 + x; outputèy 3 è In order to apply Theorem 4.3., we must showhow to write an algorithm that takes 5n+1 real inputs representing a program p 2 P n and an input x and determines if pèxè é 0. The algorithm takes Oèlog nè time to examine the parameters p i;j ; 1 ç j ç 5 and execute the code for line i of a program. Since a program has n lines of code, the total runtime of the algorithm is Oèn log nè. First the algorithm examines p i;1 to determine the operation to be performed. Next it performs a one-sided binary search over possible values of p i;2 to determine the ærst operand. It does the same thing to determine p i;4. Finally it performs the operation for the i th line of code. The following is part of the code for such an algorithm. l i : l i;æ;p2:1 : l i;æ;p2:2 : l i;æ;p2:3 : l i;æ;p2:4 : if p i;1 = 0 then goto l i;+ if p i;1 = 1 then goto l i;, if p i;2 =,1 then goto l i;æ;c if p i;2 = 0 then goto l i;æ;x if p i;2 é 1 then goto l i;æ;p2:1 goto l i;æ;y1 if p i;2 é 2 then goto l i;æ;p2:2 goto l i;æ;y2 if p i;2 é 4 then goto l i;æ;p2:3 if p i;2 = 3 then goto l i;æ;y3 goto l i;æ;y4 if p i;2 é 8 then goto l i;æ;p2:5 if p i;2 é 6 then goto l i;æ;p2:4 if p i;2 = 5 then goto l i;æ;y5 goto l i;æ;y6 if p i;2 = 7 then goto l i;æ;y7 goto l i;æ;y8 23

l i;æ;p2:5 : l i;æ;y3 : if p i;2 é 16 then goto l i;æ;p2:9. if p i;4 =,1 then goto l i;æ;y3;c if p i;4 = 0 then goto l i;æ;y3;x if p i;4 é 1 then goto l i;æ;y3;p4:1 y i = y 3 æ y 1 goto l i+1 l i;æ;y3;c : y i = y 3 æ p i;5. Using the algorithm described above, we can apply Theorem 4.3. to obtain the following upper bound on the VCP-dimension of the class P n of straight line programs with n lines of computing code. Theorem 4.4. VCP-dimensionèP n è=oèn 2 log nè. The upper bound given in the previous result depends on the operations allowed in a straight line program. If we have a diæerent set of basic operations, then we get diæerent bounds on the VCP-dimension. For example, if we allow not only addition, subtraction and multiplication, but also exponentiation èdenoted by "è and æoor functions èdenoted by bcè, then we ænd that the VCP-dimension of straight line programs is inænite. Deænition. Program class P æ n is deæned by the following schema: P æ n := fp j pèxè:=y 1 =e 1 ;. y n =e n ; outputèy n èg; where e i = ba i c or e i = a i æ i b i and æ i 2f+;,;æ;"g and a i ;b i 2 Qëfxgëfy j jjéig: Each line of code in a program from P æ n either takes the æoor of an operand, performs an exponentiation, or adds, subtracts or multiplies two operands. The operands permitted are constants, the input x, or a previous expression. 24